A data-driven approach to answering customer service e-mail inquiries in the Arabic language
Amjad Almuqati1, Abdulaziz Albarrak2, Qazi Mudasser2 and Heider Wahsheh2
Department of Information Systems,College of Computer Science and Information Technology, King Faisal University,Al-Ahsa,Saudi Arabia2
Corresponding Author : Amjad Almuqati
Recieved : 09-June-2025; Revised : 19-January-2026; Accepted : 26-January-2026
Abstract
E-mail is a widely used mode of communication in both professional and personal contexts. However, automating customer service responses in the Arabic language remains a challenging task. This study aims to develop a machine learning–based framework for classifying Arabic e-mails and generating appropriate automated responses. Two key innovations are introduced: (i) the release of the first well-described Arabic institutional e-mail dataset, and (ii) the design of a hybrid processing pipeline that integrates supervised classification with term frequency–inverse document frequency (TF-IDF) and cosine similarity–based retrieval. This dual approach directly addresses the underexplored domain of Arabic-language e-mail automation. A design science research (DSR) framework was employed to analyze e-mail records from 9,511 employees at King Faisal University. The e-mails were labeled as either inquiries or complaints using supervised learning methods, with particular emphasis on a support vector machine (SVM) classifier. TF-IDF was used for text feature extraction, while cosine similarity measured lexical similarity for response retrieval. Based on the predicted category, the system generated appropriate automated replies. The SVM trained on TF-IDF features achieved an accuracy of 90.4% on the held-out test set, with a precision of 93.5%, a recall of 94.1%, and an F1-score of 93.8%. During five-fold cross-validation, it obtained an average accuracy of 91.8% and an average F1-score of 91.4%. The SVM consistently outperformed logistic regression, naïve Bayes, and random forest classifiers. This study addresses an important research gap in the humanization of Arabic e-mail responses by proposing an effective supervised classification and response generation framework. The high accuracy, precision, and recall demonstrate that SVM is well suited for categorizing Arabic e-mails. Overall, the proposed system offers practical tools for Arabic language technology and contributes to the development of automated customer support solutions.
Keywords
Arabic e-mail classification, Machine learning, Support vector machine, TF-IDF, Cosine similarity, Automated customer support.
Cite this article
Almuqati A, Albarrak A, Mudasser Q, Wahsheh H. A data-driven approach to answering customer service e-mail inquiries in the Arabic language. International Journal of Advanced Technology and Engineering Exploration. 2026;13(134):123-143. DOI : 10.19101/IJATEE.2025.121220769
