(Publisher of Peer Reviewed Open Access Journals)
ICETT-2012
Full-Text PDF
Paper Title : A Review of Focused Web Crawling Strategies
Author Name : Bireshwar Ganguly, Rahila Sheikh
Abstract : Modern world with tons of competition also brings a sense of responsibility of preserving the valuable time of user in case of searching for information around the web. But the abundance of data indexed is quite huge and with different user perspective, searching has a significant impact using a standard exhaustive crawling. A standard crawler starts well with a promising set of initial seed URLs but the amplitude of its graph decline in between the process. This is major reason why researches place heavy emphasis on the relevancy and robustness of the data found. Also the users’ perspective differs from time to time from topic to topic. i.e. ones’ want is others unnecessary. This is where the importance of Focused crawling comes into play. Focused crawlers aim to search and retrieve only the subset of the world-wide web that pertains to a specific topic of relevance. The ideal focused crawler retrieves the maximal set of relevant pages while simultaneously traversing the minimal number of irrelevant documents on the web. In this paper we review the researches on several focused web crawling strategies and propose a new technique which focuses on the assignment of credits to the web pages as per its semantic contents. We also give emphasis to prioritize the frontier queue so that the higher credit page URLs are given priority to crawl over lower one.
Keywords : Web crawling algorithms, search engine, focused crawling algorithm survey, page rank, Information Retrieval.
Cite this article : Bireshwar Ganguly, Rahila Sheikh " A Review of Focused Web Crawling Strategies " ,ICETT-2012 ,Page No : 252-258.