International Journal of Advanced Computer Research ISSN (Print): 2249-7277    ISSN (Online): 2277-7970 Volume-8 Issue-35 March-2018
  1. 5979
    Citations
Paper Title:
Spatial distribution analysis of unigrams and bigrams of hindi literary document
Author Name:
Sifatullah Siddiqi
Abstract:
In this paper the spatial distribution analysis of a very famous Hindi literary document “Godan” authored by the great novelist Munshi Premchand has been presented. We have attempted to perform a thorough and comprehensive spatial distribution analysis of different kinds of words (unigram) and word pairs (bigrams) in the document. Single words have been divided into stop words, keywords and non-keywords while word pairs have been divided into stop-phrases, key phrases and non-key phrases. Our proposition is that the nature of the spatial distribution pattern of different types of unigrams and bigrams in the text is different and there is a significant similarity between spatial distribution patterns for the unigrams and bigrams of same type. In this paper, we have selected a lot of example words from the text and generated their spatial distribution graphs to prove our assertion.
Keywords:
Stop words, Keywords, Key phrase, Spatial distribution analysis, Hindi.
Cite this article:
Sifatullah Siddiqi.Spatial distribution analysis of unigrams and bigrams of hindi literary document. International Journal of Advanced Computer Research. 2018;8(35):97-109. DOI:10.19101/IJACR.2018.835003
References:
[1]Luhn HP. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development. 1957; 1(4):309-17.
[2]Ortuno M, Carpena P, Bernaola-Galván P, Munoz E, Somoza AM. Keyword detection in natural languages and DNA. Europhysics Letters. 2002; 57(5):759-64.
[3]Herrera JP, Pury PA. Statistical keyword detection in literary corpora. The European Physical Journal B. 2008; 63(1):135-46.
[4]Carpena P, Bernaola-Galván P, Hackenberg M, Coronado AV, Oliver JL. Level statistics of words: finding keywords in literary texts and symbolic sequences. Physical Review E. 2009; 79(3):1-4.
[5]Mehri A, Darooneh AH. Keyword extraction by nonextensivity measure. Physical Review E. 2011; 83(5):1-6.
[6]Carretero-Campos C, Bernaola-Galván P, Coronado AV, Carpena P. Improving statistical keyword detection in short texts: entropic and clustering approaches. Physica A: Statistical Mechanics and its Applications. 2013; 392(6):1481-92.
[7]Yang Z, Lei J, Fan K, Lai Y. Keyword extraction by entropy difference between the intrinsic and extrinsic mode. Physica A: Statistical Mechanics and its Applications. 2013; 392(19):4523-31.
[8]Siddiqi S, Sharan A. Keyword extraction from single documents using mean word intermediate distance. International Journal of Advanced Computer Research. 2016; 6(25):138-45.
[9]Sharan A, Siddiqi S, Singh J. Keyword extraction from Hindi documents using statistical approach. In intelligent computing, communication and devices 2015 (pp. 507-13). Springer, New Delhi.
[10]Siddiqi S, Sharan A. Keyword and keyphrase extraction from single Hindi document using statistical approach. In international conference on signal processing and integrated networks 2015 (pp. 713-8). IEEE.
cuan tak henti turun bikin pemain enggan beranjakjejak scatter hitam sumber saldo tebal yang dinantikombinasi spin cakar76 kunci pemain mengejar jackpotscatter mudah didapat berkat pola terbarusetting mahjong jadi pintu menuju jackpotRaih Hasil Spektakuler Saat Main Mahjong Ways 2Iseng Main Mahjong Wins 3 Membawa Jackpot 120 JutaNikmatnya Coba Pola 20-30-10 Mahjong Ways 1Strategi Tepat Hasilkan Kemenangan Hebat Di Mahjong Wins