(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Technology and Engineering Exploration (IJATEE)

ISSN (Print):2394-5443    ISSN (Online):2394-7454
Volume-8 Issue-75 February-2021
Full-Text PDF
Paper Title : Development of data-to-text (D2T) on generic data using fuzzy sets
Author Name : Lala Septem Riza, Muhammad Ridwan, Enjun Junaeti and Khyrina Airin Fariza Abu Samah
Abstract :

Data-to-Text (D2T) is an option for translating non-linguistic data into textual form. However, along with technological developments, the various fields of data and the variety of users are one of the focuses that must be considered in the development of D2T. This study aims to develop a D2T system with input in the form of general data so that it can receive data from any field or domain, whether the data have header information, data types, rules or not. Then fuzzy rule based systems are used to interpret data in general. The system developed can produce information in the form of data summaries, newest data information, and predictive information. It is carried out in the R programming language by utilizing several available packages. Experiments are carried out by measuring the level of readability of the news generated, computation time, and comparing the results with related research. The experimental results show that the information generated is proven to represent the data provided and can be understood by the level of students even at the elementary school level, and the computation time is quite good.

Keywords : Data-to-text, Natural language generation, Machine learning, General purpose, General corpora, Fuzzy rule based system, Time-series analysis, Linear regression, Knuth-morris-pratt.
Cite this article : Riza LS, Ridwan M, Junaeti E, Samah KA. Development of data-to-text (D2T) on generic data using fuzzy sets. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(75):382-390. DOI:10.19101/IJATEE.2020.762134.
References :
[1]Gerstl P. Linking linguistic and non-linguistic information. Data & knowledge engineering. 1992; 8(3):205-22.
[Crossref] [Google Scholar]
[2]Reiter E. An architecture for data-to-text systems. In proceedings of the eleventh European workshop on natural language generation (ENLG 07) 2007 (pp. 97-104).
[Google Scholar]
[3]Gkatzia D, Lemon O, Rieser V. Data-to-text generation improves decision-making under uncertainty. IEEE Computational Intelligence Magazine. 2017; 12(3):10-7.
[Crossref] [Google Scholar]
[4]McDonald DD. Natural language generation. Handbook of Natural Language Processing. 2010; 2:121-44.
[Google Scholar]
[5]Soehn JP, Zinsmeister H, Rehm G. Requirements of a user-friendly, general-purpose corpus query interface. Proceedings of the LREC Workshop Sustainability of Language Resources and Tools for Natural Language Processing. 2008 (pp. 27-32).
[Google Scholar]
[6]Goldberg E, Driedger N, Kittredge RI. Using natural-language processing to produce weather forecasts. IEEE Expert. 1994; 9(2):45-53.
[Crossref] [Google Scholar]
[7]Riza LS, Putra B, Wihardi YA, Paramita B. Data to text for generating information of weather and air quality in the R programming language. Journal of Engineering Science and Technology. 2019; 14(1):498-508.
[Google Scholar]
[8]Reiter E, Sripada SG, Robertson R. Acquiring correct knowledge for natural language generation. Journal of Artificial Intelligence Research. 2003; 18:491-516.
[Crossref] [Google Scholar]
[9]Portet F, Reiter E, Gatt A, Hunter J, Sripada S, Freer Y, Sykes C. Automatic generation of textual summaries from neonatal intensive care data. Artificial Intelligence. 2009; 173(7-8):789-816.
[Crossref] [Google Scholar]
[10]Hunter J, Freer Y, Gatt A, Reiter E, Sripada S, Sykes C, Westwater D. BT-Nurse: computer generation of natural language shift summaries from complex heterogeneous medical data. Journal of the American Medical Informatics Association. 2011; 18(5):621-4.
[Crossref] [Google Scholar]
[11]Kukich K. Design of a knowledge-based report generator. In meeting of the association for computational linguistics 1983 (pp. 145-50).
[Google Scholar]
[12]Carbonell JG, Michalski RS, Mitchell TM. An overview of machine learning. Machine Learning. 1983; 1:3-23.
[Crossref] [Google Scholar]
[13]Riza LS, Handian D, Megasari R, Abdullah AG, Nandiyanto AB, Nazir S. Development of R package and experimental analysis on prediction of the CO2 compressibility factor using gradient descent. Journal of Engineering Science and Technology. 2018; 13(8):2342-51.
[Google Scholar]
[14]Riza LS, Nasrulloh IF, Junaeti E, Zain R, Nandiyanto AB. gradDescentR: An R package implementing gradient descent and its variants for regression tasks. In international conference on information technology, information systems and electrical engineering 2016 (pp. 125-9). IEEE.
[Crossref] [Google Scholar]
[15]Riza LS, Rachmat AB, Munir TH, Nazir S. Genomic repeat detection using the knuth-morris-pratt algorithm on r high-performance-computing package. International Journal of Advances in Soft Computing and its Applications. 2019; 11(1):94-111.
[Google Scholar]
[16]Riza LS, Firmansyah MI, Siregar H, Budiana D, Rosales-Pérez A. Determining strategies on playing badminton using the Knuth-Morris-Pratt algorithm. TELKOMNIKA Telecommunication Computing Electronics and Control. 2018; 16(6):2763-70.
[Crossref] [Google Scholar]
[17]Riza LS, Anwar FS, Rahman EF, Abdullah CU, Nazir S. Natural language processing and levenshtein distance for generating error identification typed questions on TOEFL. Journal of Computers for Society. 2020; 1(1):1-23.
[Google Scholar]
[18]Atilgan A, Tanriverdi C, Yucel A, Oz H, Degirmenci H. Analysis of long-term temperature data using Mann–Kendall trend test and linear regression methods: the case of the southeastern Anatolia region. Scientific Papers Series a Agronomy LX. 2017:455-62.
[Google Scholar]
[19]Régnier M. Knuth-Morris-Pratt algorithm: an analysis. In international symposium on mathematical foundations of computer science 1989 (pp. 431-44). Springer, Berlin, Heidelberg.
[Crossref] [Google Scholar]
[20]Ostertagova E, Ostertag O. Forecasting using simple exponential smoothing method. Acta Electrotechnica et Informatica. 2012; 12(3):62-6.
[Crossref] [Google Scholar]
[21]Castillo-Ortega R, Marín N, Martinez-Cruz C, Sánchez D. A proposal for the hierarchical segmentation of time series. application to trend-based linguistic description. In IEEE international conference on fuzzy systems (fuzz-IEEE) 2014 (pp. 489-96). IEEE.
[Crossref] [Google Scholar]
[22]Vallero DA. Fundamentals of air pollution. Academic Press; 2014.
[Google Scholar]
[23]https://www.unc.edu/~rowlett/units/scales/beaufort.html. Accessed 20 May 2018.
[24]http://snowfence.umn.edu/Components/winddirectionanddegreeswithouttable3.htm. Accessed 20 May 2018.
[25]http://www.theweatherprediction.com/habyhints/189/. Accessed 20 May 2018.
[26]Belz A. Probabilistic generation of weather forecast texts. In human language technologies 2007: the conference of the north american chapter of the association for computational linguistics; proceedings of the main conference 2007 (pp. 164-71).
[Google Scholar]
[27]Ramos-Soto A, Bugarín A, Barro S. Fuzzy sets across the natural language generation pipeline. Progress in Artificial Intelligence. 2016; 5(4):261-76.
[Crossref] [Google Scholar]
[28]Reiter E, Dale R. Building applied natural language generation systems. Natural Language Engineering. 1997; 3(1):57-87.
[Crossref] [Google Scholar]
[29]Ramos-Soto A, Bugarín A, Barro S. On the role of linguistic descriptions of data in the building of natural language generation systems. Fuzzy Sets and Systems. 2016; 285:31-51.
[Crossref] [Google Scholar]
[30]Gkatzia D, Lemon O, Rieser V. Natural language generation enhances human decision-making with uncertain information. arXiv preprint arXiv:1606.03254. 2016.
[Google Scholar]