(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Computer Research (IJACR)

ISSN (Print):2249-7277    ISSN (Online):2277-7970
Volume-7 Issue-30 May-2017
Full-Text PDF
DOI:10.19101/IJACR.2017.730020
Paper Title : A subject identification method based on term frequency technique
Author Name : Nurul Syafidah Jamil, Ku Ruhana Ku-Mahamud, Aniza Mohamed Din, Faudziah Ahmad, Noraziah ChePa, Wan Hussain Wan Ishak, Roshidi Din and Farzana Kabir Ahmad
Abstract :

The analyzing and extracting important information from a text document is crucial and has produced interest in the area of text mining and information retrieval. This process is used in order to notice particularly in the text. Furthermore, on view of the readers that people tend to read almost everything in text documents to find some specific information. However, reading a text document consumes time to complete and additional time to extract information. Thus, classifying text to a subject can guide a person to find relevant information. In this paper, a subject identification method which is based on term frequency to categorize groups of text into a particular subject is proposed. Since term frequency tends to ignore the semantics of a document, the term extraction algorithm is introduced for improving the result of the extracted relevant terms from the text. The evaluation of the extracted terms has shown that the proposed method is exceeded other extraction techniques.

Keywords : Subject identification, Text classification, Term frequency, Term filtering, Text document.
Cite this article : Nurul Syafidah Jamil, Ku Ruhana Ku-Mahamud, Aniza Mohamed Din, Faudziah Ahmad, Noraziah ChePa, Wan Hussain Wan Ishak, Roshidi Din and Farzana Kabir Ahmad, " A subject identification method based on term frequency technique " , International Journal of Advanced Computer Research (IJACR), Volume-7, Issue-30, May-2017 ,pp.103-110. DOI:10.19101/IJACR.2017.730020
References :
[1]Korde V, Mahender CN. Text classification and classifiers: a survey. International Journal of Artificial Intelligence & Applications. 2012; 3(2):85-99.
[Crossref] [Google Scholar]
[2]Weiss SM, Indurkhya N, Zhang T, Damerau F. Text mining: predictive methods for analyzing unstructured information. Springer Science & Business Media; 2010.
[Google Scholar]
[3]Aggarwal CC, Zhai C. A survey of text classification algorithms. In mining text data 2012 (pp. 163-222). Springer US.
[Crossref] [Google Scholar]
[4]Patil TR, Sherekar SS. Performance analysis of Naive Bayes and J48 classification algorithm for data classification. International Journal of Computer Science and Applications. 2013; 6(2):256-61.
[Google Scholar]
[5]Elmehdwi Y, Samanthula BK, Jiang W. Secure k-nearest neighbor query over encrypted data in outsourced environments. In international conference on data engineering 2014 (pp. 664-75). IEEE.
[Crossref] [Google Scholar]
[6]Celebi ME, Kingravi HA, Vela PA. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications. 2013; 40(1):200-10.
[Crossref] [Google Scholar]
[7]Bouamor D, Semmar N, Zweigenbaum P. Using wordnet and semantic similarity for bilingual terminology mining from comparable corpora. In proceedings of the 6th workshop on building and using comparable corpora 2013 (pp. 16-23).
[Google Scholar]
[8]Gupta R, Pal S, Bandyopadhyay S. Improving MT system using extracted parallel fragments of text from comparable corpora. In proceedings of 6th workshop of building and using comparable corpora 2013 (pp. 69-76).
[Google Scholar]
[9]Ker SJ, Chen JN. A text categorization based on summarization technique. In proceedings of the ACL-2000 workshop on recent advances in natural language processing and information retrieval: held in conjunction with the 38th annual meeting of the association for computational linguistics (pp. 79-83). Association for Computational Linguistics.
[Crossref] [Google Scholar]
[10]Baghdadi HS, Ranaivo-Malançon B. An automatic topic identification algorithm. Journal of Computer Science. 2011; 7(9):1363-7.
[Crossref] [Google Scholar]
[11]Meena YK, Jain A, Gopalani D. Survey on graph and cluster based approaches in multi-document text summarization. In recent advances and innovations in engineering 2014 (pp. 1-5). IEEE.
[Crossref] [Google Scholar]
[12]Sawant Ganesh S, Kanawade Bhavana R. A review on topic modeling in information retrieval. 2014.
[Google Scholar]
[13]Butarbutar M, McRoy S. Indexing text documents based on topic identification. In international symposium on string processing and information retrieval 2004 (pp. 113-24). Springer Berlin Heidelberg.
[Crossref] [Google Scholar]
[14]Jain S, Pareek J. Automatic topic (s) identification from learning material: An ontological approach. In second international conference on computer engineering and applications 2010 (pp. 358-62). IEEE.
[Crossref] [Google Scholar]
[15]McDonough J, Ng K, Jeanrenaud P, Gish H, Rohlicek JR. Approaches to topic identification on the switchboard corpus. In international conference on acoustics, speech, and signal processing 1994 (pp. I-385). IEEE.
[Crossref] [Google Scholar]
[16]Berkowitz S. Method of identifying topic of text using nouns. The United States of America as represented by the Director National Security Agency. United States Patent US 7,805,291. 2010.
[Google Scholar]
[17]Dalal MK, Zaveri MA. Automatic text classification of sports blog data. In computing, communications and applications conference 2012 (pp. 219-22). IEEE.
[Crossref] [Google Scholar]
[18]Van Zaanen M, Kanters P. Automatic mood classification using TF* IDF based on lyrics. In international society for music information retrieval conference 2010 (pp. 75-80).
[Google Scholar]
[19]Coursey K, Mihalcea R, Moen W. Using encyclopedic knowledge for automatic topic identification. In proceedings of the thirteenth conference on computational natural language learning 2009 (pp. 210-8). Association for Computational Linguistics.
[Google Scholar]
[20]Schönhofen P. Identifying document topics using the Wikipedia category network. Web Intelligence and Agent Systems: an International Journal. 2009; 7(2):195-207.
[Crossref] [Google Scholar]
[21]Ku-Mahamud KR, Ahmad F, Mohamed Din A, Ishak W, Hussain W, Ahmad FK, et al. Semantic network representation of female related issues from the Holy Quran. Knowledge management international conference 2012 (pp. 726-30).
[Google Scholar]