(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Computer Research (IJACR)

ISSN (Print):2249-7277    ISSN (Online):2277-7970
Volume-8 Issue-35 March-2018
Full-Text PDF
DOI:10.19101/IJACR.2018.835003
Paper Title : Spatial distribution analysis of unigrams and bigrams of hindi literary document
Author Name : Sifatullah Siddiqi
Abstract :

In this paper the spatial distribution analysis of a very famous Hindi literary document “Godan” authored by the great novelist Munshi Premchand has been presented. We have attempted to perform a thorough and comprehensive spatial distribution analysis of different kinds of words (unigram) and word pairs (bigrams) in the document. Single words have been divided into stop words, keywords and non-keywords while word pairs have been divided into stop-phrases, key phrases and non-key phrases. Our proposition is that the nature of the spatial distribution pattern of different types of unigrams and bigrams in the text is different and there is a significant similarity between spatial distribution patterns for the unigrams and bigrams of same type. In this paper, we have selected a lot of example words from the text and generated their spatial distribution graphs to prove our assertion.

Keywords : Stop words, Keywords, Key phrase, Spatial distribution analysis, Hindi.
Cite this article : Sifatullah Siddiqi, " Spatial distribution analysis of unigrams and bigrams of hindi literary document " , International Journal of Advanced Computer Research (IJACR), Volume-8, Issue-35, March-2018 ,pp.97-109.DOI:10.19101/IJACR.2018.835003
References :
[1]Luhn HP. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development. 1957; 1(4):309-17.
[Crossref] [Google Scholar]
[2]Ortuno M, Carpena P, Bernaola-Galván P, Munoz E, Somoza AM. Keyword detection in natural languages and DNA. Europhysics Letters. 2002; 57(5):759-64.
[Google Scholar]
[3]Herrera JP, Pury PA. Statistical keyword detection in literary corpora. The European Physical Journal B. 2008; 63(1):135-46.
[Crossref] [Google Scholar]
[4]Carpena P, Bernaola-Galván P, Hackenberg M, Coronado AV, Oliver JL. Level statistics of words: finding keywords in literary texts and symbolic sequences. Physical Review E. 2009; 79(3):1-4.
[Crossref] [Google Scholar]
[5]Mehri A, Darooneh AH. Keyword extraction by nonextensivity measure. Physical Review E. 2011; 83(5):1-6.
[Crossref] [Google Scholar]
[6]Carretero-Campos C, Bernaola-Galván P, Coronado AV, Carpena P. Improving statistical keyword detection in short texts: entropic and clustering approaches. Physica A: Statistical Mechanics and its Applications. 2013; 392(6):1481-92.
[Crossref] [Google Scholar]
[7]Yang Z, Lei J, Fan K, Lai Y. Keyword extraction by entropy difference between the intrinsic and extrinsic mode. Physica A: Statistical Mechanics and its Applications. 2013; 392(19):4523-31.
[Crossref] [Google Scholar]
[8]Siddiqi S, Sharan A. Keyword extraction from single documents using mean word intermediate distance. International Journal of Advanced Computer Research. 2016; 6(25):138-45.
[Crossref] [Google Scholar]
[9]Sharan A, Siddiqi S, Singh J. Keyword extraction from Hindi documents using statistical approach. In intelligent computing, communication and devices 2015 (pp. 507-13). Springer, New Delhi.
[Crossref] [Google Scholar]
[10]Siddiqi S, Sharan A. Keyword and keyphrase extraction from single Hindi document using statistical approach. In international conference on signal processing and integrated networks 2015 (pp. 713-8). IEEE.
[Crossref] [Google Scholar]