Programmers: How to measure the quality of a web crawler algorithm

samedi 31 janvier 2015

How to measure the quality of a web crawler algorithm

I developed a new web crawler algorithm as a part of my thesis using java. Now i should compare my work with previous developed crawler algorithms. Since every developed algorithm has its own web server environment to test on and its different with the other environments of the other web crawler algorithms. Besides some of these algorithms are uses just a single as a starting seed while the others are using multiple seeds for crawling. So it will not be fair to compare them as same standards.

I need to know how to find (or what is) the common futures that can allow me to compare these previous algorithms like BFS [1], OPIC [2] and PageRank[3] with I developed one, and what is the key future for measuring the quality when you developing a new web crawling algorithm.

References:

[1] Najork, Marc, and Janet L. Wiener. "Breadth-first crawling yields high- quality pages." Proceedings of the 10th international conference on World Wide Web. ACM, 2001.

[2] Abiteboul, Serge, Mihai Preda, and Gregory Cobena. "Adaptive on-line page importance computation." Proceedings of the 12th international conference on World Wide Web. ACM, 2003.

[3] Page, Lawrence, et al. "The PageRank citation ranking: Bringing order to the web." (1999).

Thanks in advance

Programmers

samedi 31 janvier 2015

How to measure the quality of a web crawler algorithm

Aucun commentaire:

Enregistrer un commentaire