(Publisher of Peer Reviewed Open Access Journals)
ICETT-2012
Full-Text PDF
Paper Title : Template Extraction from Heterogeneous Web Pages
Author Name : Trupti B. Mane, Girish P. Potdar
Abstract : The World Wide Web (WWW) is getting a lot of attention as it is becoming huge repository of information. A web page gets deployed on website by its web template system. Those templates can be used by any individual or organization to set up their website. Also the templates provide its readers the ease of access to the contents guided by consistent structures. Hence the template detection techniques are emerging as Web Templates are becoming more and more important. Earlier systems consider all documents are guaranteed to conform to a common template and hence template extraction is done with those assumptions. However it is not feasible in real application. Our focus is on extracting templates from heterogeneous web pages. But due to large variety of web documents, there is a need to manage unknown number of templates. This can be achieved by clustering web documents by selecting a good partition method. The correctness of extracted templates depending on quality of clustering.
Keywords : Template extraction, Clustering, Data mining, Information search and retrieval.
Cite this article : Trupti B. Mane, Girish P. Potdar " Template Extraction from Heterogeneous Web Pages " ,ICETT-2012 ,Page No : 193-196.