Journals : Accents Journal

ACCENTS Journals

A Unit of ACCENTS

(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Computer Research (IJACR)

ISSN (Print):2249-7277 ISSN (Online):2277-7970

Volume-8 Issue-35 March-2018

Full-Text PDF
DOI:10.19101/IJACR.2018.835003

Paper Title

Spatial distribution analysis of unigrams and bigrams of hindi literary document

Author Name

Sifatullah Siddiqi

Abstract

In this paper the spatial distribution analysis of a very famous Hindi literary document “Godan” authored by the great novelist Munshi Premchand has been presented. We have attempted to perform a thorough and comprehensive spatial distribution analysis of different kinds of words (unigram) and word pairs (bigrams) in the document. Single words have been divided into stop words, keywords and non-keywords while word pairs have been divided into stop-phrases, key phrases and non-key phrases. Our proposition is that the nature of the spatial distribution pattern of different types of unigrams and bigrams in the text is different and there is a significant similarity between spatial distribution patterns for the unigrams and bigrams of same type. In this paper, we have selected a lot of example words from the text and generated their spatial distribution graphs to prove our assertion.

Keywords

Stop words, Keywords, Key phrase, Spatial distribution analysis, Hindi.

Cite this article

Sifatullah Siddiqi, " Spatial distribution analysis of unigrams and bigrams of hindi literary document " , International Journal of Advanced Computer Research (IJACR), Volume-8, Issue-35, March-2018 ,pp.97-109.DOI:10.19101/IJACR.2018.835003

References

[1]Luhn HP. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development. 1957; 1(4):309-17.
[Crossref]	[Google Scholar]

[2]Ortuno M, Carpena P, Bernaola-Galván P, Munoz E, Somoza AM. Keyword detection in natural languages and DNA. Europhysics Letters. 2002; 57(5):759-64.
[Google Scholar]

[3]Herrera JP, Pury PA. Statistical keyword detection in literary corpora. The European Physical Journal B. 2008; 63(1):135-46.
[Crossref]	[Google Scholar]

[4]Carpena P, Bernaola-Galván P, Hackenberg M, Coronado AV, Oliver JL. Level statistics of words: finding keywords in literary texts and symbolic sequences. Physical Review E. 2009; 79(3):1-4.
[Crossref]	[Google Scholar]

[5]Mehri A, Darooneh AH. Keyword extraction by nonextensivity measure. Physical Review E. 2011; 83(5):1-6.
[Crossref]	[Google Scholar]

[6]Carretero-Campos C, Bernaola-Galván P, Coronado AV, Carpena P. Improving statistical keyword detection in short texts: entropic and clustering approaches. Physica A: Statistical Mechanics and its Applications. 2013; 392(6):1481-92.
[Crossref]	[Google Scholar]

[7]Yang Z, Lei J, Fan K, Lai Y. Keyword extraction by entropy difference between the intrinsic and extrinsic mode. Physica A: Statistical Mechanics and its Applications. 2013; 392(19):4523-31.
[Crossref]	[Google Scholar]

[8]Siddiqi S, Sharan A. Keyword extraction from single documents using mean word intermediate distance. International Journal of Advanced Computer Research. 2016; 6(25):138-45.
[Crossref]	[Google Scholar]

[9]Sharan A, Siddiqi S, Singh J. Keyword extraction from Hindi documents using statistical approach. In intelligent computing, communication and devices 2015 (pp. 507-13). Springer, New Delhi.
[Crossref]	[Google Scholar]

[10]Siddiqi S, Sharan A. Keyword and keyphrase extraction from single Hindi document using statistical approach. In international conference on signal processing and integrated networks 2015 (pp. 713-8). IEEE.
[Crossref]	[Google Scholar]