A novel web crawling method for vertical search engines

Abstract

The main goal of focused web crawlers is to retrieve as many relevant pages as possible. However, most of the crawlers use page rank algorithm to lineup the pages in the crawler frontier. Since the page rank algorithm suffers from the drawback of “Richer get rich phenomenon”, focused crawlers often fail to retrieve the hidden relevant pages. This paper presents a novel approach for retrieving the hidden and relevant pages by combining rank and semantic similarity information. The model is validated by crawling the real web with different topics and the results are promising.

References

Page 1

	Year	Citations

Page 1