(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Computer Research (IJACR)

ISSN (Print):2249-7277    ISSN (Online):2277-7970
Volume-6 Issue-25 July-2016
Full-Text PDF
DOI:10.19101/IJACR.2016.625003
Paper Title : Keyword extraction from single documents using mean word intermediate distance
Author Name : Sifatullah Siddiqi and Aditi Sharan
Abstract :

Keyword extraction is an important task in text mining. In this paper a novel, unsupervised, domain independent and language independent approach for automatic keyword extraction from single documents have been proposed. We have used the word intermediate distance vector and its mean value to extract keywords. We have compared our approach with results from the standard deviation of intermediate distances approach as standard and found that there is heavy overlapping between the results of both approaches with the advantage that our approach is faster, especially in case of long documents as it removes the need to compute the standard deviation of word intermediate distance vector. Two famous works viz. “Origin of Species” and “A Brief History of Time” to demonstrate the experimental results have been used. Experiments show that the proposed approach works almost as better as the standard deviation approach and the percentage overlap between top 30 extracted keywords is more than 50%.

Keywords : Keyword extraction, Word means intermediate distance, Clustering, Standard deviation.
Cite this article : Sifatullah Siddiqi and Aditi Sharan , " Keyword extraction from single documents using mean word intermediate distance " , International Journal of Advanced Computer Research (IJACR), Volume-6, Issue-25, July-2016 ,pp.138-145.DOI:10.19101/IJACR.2016.625003
References :
[1]Zhang C, Wang H, Liu Y, Wu D, Liao Y, Wang B. Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems. 2008; 4(3):1169-80.
[Google Scholar]
[2]Siddiqi S, Sharan A. Keyword and keyphrase extraction techniques: a literature review. International Journal of Computer Applications. 2015; 109(2):18-23.
[Crossref] [Google Scholar]
[3]Sparck Jones K. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation. 1972; 28(1):11-21.
[Crossref] [Google Scholar]
[4]Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing & Management. 1988; 24(5):513-23.
[Crossref] [Google Scholar]
[5]Buckley C. The importance of proper weighting methods. In proceedings of the workshop on human language technology 1993 (pp. 349-52). Association for Computational Linguistics.
[Crossref] [Google Scholar]
[6]Turney PD. Learning algorithms for keyphrase extraction. Information Retrieval. 2000; 2(4):303-36.
[Crossref] [Google Scholar]
[7]Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG. Domain-specific keyphrase extraction. In international joint conference on artificial intelligence 1999 (pp. 668-73).
[Google Scholar]
[8]Hulth A. Improved automatic keyword extraction given more linguistic knowledge. In proceedings of the conference on empirical methods in natural language processing 2003 (pp. 216-23). Association for Computational Linguistics.
[Crossref] [Google Scholar]
[9]Zhang C. Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems. 2008; 4(3):1169-80.
[Google Scholar]
[10]Litvak M, Last M, Aizenman H, Gobits I, Kandel A. DegExt-A language-independent graph-based keyphrase extractor. In advances in intelligent web mastering–3 2011 (pp. 121-30). Springer Berlin Heidelberg.
[Crossref] [Google Scholar]
[11]Harter SP. A probabilistic approach to automatic keyword indexing. Part II. An algorithm for probabilistic indexing. Journal of the American Society for Information Science. 1975; 26(5):280-9.
[Crossref] [Google Scholar]
[12]Bookstein A, Swanson DR. Probabilistic models for automatic indexing. Journal of the American Society for Information Science. 1974; 25(5):312-6.
[Crossref] [Google Scholar]
[13]Ortuño M, Carpena P, Bernaola-Galván P, Muñoz E, Somoza AM. Keyword detection in natural languages and DNA. EPL (Europhysics Letters). 2002; 57(5):759-64.
[Crossref] [Google Scholar]
[14]Herrera JP, Pury PA. Statistical keyword detection in literary corpora. The European Physical Journal B. 2008; 63(1):135-46.
[Crossref] [Google Scholar]
[15]Feng J, Xie F, Hu X, Li P, Cao J, Wu X. Keyword extraction based on sequential pattern mining. In proceedings of the third international conference on internet multimedia computing and service 2011 (pp. 34-8). ACM.
[Crossref] [Google Scholar]
[16]Hong B, Zhen D. An extended keyword extraction method. International conference on applied physics and industrial engineering 2012 (pp. 1120-7). Physics Procedia.
[Crossref] [Google Scholar]
[17]Mehri A, Darooneh AH. Keyword extraction by nonextensivity measure. Physical Review E. 2011; 83(5):056106.
[Crossref] [Google Scholar]
[18]Carretero-Campos C, Bernaola-Galván P, Coronado AV, Carpena P. Improving statistical keyword detection in short texts: entropic and clustering approaches. Physica A: Statistical Mechanics and its Applications. 2013; 392(6):1481-92.
[Crossref] [Google Scholar]