(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Technology and Engineering Exploration (IJATEE)

ISSN (Print):2394-5443    ISSN (Online):2394-7454
Volume-9 Issue-90 May-2022
Full-Text PDF
Paper Title : Efficient ensemble machine learning techniques for early prediction of diphtheria diseases based on clinical data
Author Name : Bilal Abdualgalil, Sajimon Abraham and Waleed M. Ismael
Abstract :

Diphtheria is a worldwide concern, particularly in Yemen. Early detection is important for reducing diphtheria deaths. In fact, proper diphtheria diagnosis takes time due to various clinical examinations. This problem requires the development of a new diagnostic system. With machine learning (ML) techniques, continuing to be proposed, ensemble learning techniques have been introduced into healthcare applications. Efficient ensemble ML techniques (EEMLT) are used to develop prediction models for diphtheria disease in this study. Five ensemble ML models i.e., random forest classifier (RFC), gradient boosting classifier (GBC), extra tree classifier (ETC), eXtreme gradient boosting (XGB), and light gradient boosting machine (LightGBM) were used. Moreover, five popular baseline classifiers, i.e., logistic regression (LR), k-nearest neighbors (KNN), support vector classifier (SVC), decision tree classifier (DTC), multilayer perceptron (MLP), were used as benchmarks. All ensemble and baseline classifiers are trained and tested in the dataset using 10-fold cross-validation (CV) and holdout CV approaches. All models were evaluated on a test set using different metrics including accuracy, F1-sore, Recall, Precision, and area under curve (AUC) measures. According to the results of this study, the ETC model achieved high accuracy with 98.92% and 99.2% in holdout and 10-fold CV, respectively. It is found that the ETC achieved high accuracy of 99.2% in 10-fold and holdout CV approach. Finally, the experimental results reveal that the performance of ensemble classifiers has outperformed those of baseline classifiers. We believe that the proposed diphtheria prediction system will help doctors accurately predict diphtheria disease.

Keywords : Ensemble machine learning, Baseline classifiers, Diphtheria disease, SMOTE+ENN, Multiclass classification.
Cite this article : Abdualgalil B, Abraham S, Ismael WM. Efficient ensemble machine learning techniques for early prediction of diphtheria diseases based on clinical data . International Journal of Advanced Technology and Engineering Exploration. 2022; 9(90):583-603. DOI:10.19101/IJATEE.2021.875402.
References :
[1]Badell E, Alharazi A, Criscuolo A, Almoayed KA, Lefrancq N, Bouchez V, et al. Ongoing diphtheria outbreak in Yemen: a cross-sectional and genomic epidemiology study. The Lancet Microbe. 2021; 2(8):e386-96.
[Crossref] [Google Scholar]
[2]https://www.britannica.com/science/diphtheria. Accessed 28 November 2021.
[3]https://www.downtoearth.org.in/news/health/study-warns-diphtheria-could-become-a-major-global-threat-75866. Accessed 29 November 2021.
[4]Diphtheria, https://www.mayoclinic.org/diseases-conditions/diphtheria/symptoms-causes/syc-20351897. Accessed 29 November 2021.
[5]Mistry M, Bhattacharya A. Emergence of diphtheria in western part of gujarat-a microbiological case series from a tertiary care hospital of Rajkot. Saudi Journal of Pathology and Microbiology. 2021; 6(7):246-9.
[Crossref] [Google Scholar]
[6]Alakus TB, Turkoglu I. Detection of pre-epileptic seizure by using wavelet packet decomposition and artifical neural networks. In international conference on electrical and electronics engineering 2017 (pp. 511-5). IEEE.
[Google Scholar]
[7]Vickers NJ. Animal communication: when i’m calling you, will you answer too? Current Biology. 2017; 27(14):R713-5.
[Crossref] [Google Scholar]
[8]Yousefi J, Hamilton-wright A. Characterizing EMG data using machine-learning tools. Computers in Biology and Medicine. 2014; 51:1-13.
[Crossref] [Google Scholar]
[9]Karthick PA, Ghosh DM, Ramakrishnan S. Surface electromyography based muscle fatigue detection using high-resolution time-frequency methods and machine learning algorithms. Computer Methods and Programs in Biomedicine. 2018; 154:45-56.
[Crossref] [Google Scholar]
[10]Alfaras M, Soriano MC, Ortín S. A fast machine learning model for ECG-based heartbeat classification and arrhythmia detection. Frontiers in Physics. 2019; 7:103.
[Google Scholar]
[11]Ledezma CA, Zhou X, Rodriguez B, Tan PJ, Diaz-zuccarini V. A modeling and machine learning approach to ECG feature engineering for the detection of ischemia using pseudo-ECG. PloS one. 2019; 14(8):1-21.
[Crossref] [Google Scholar]
[12]Munir K, Elahi H, Ayub A, Frezza F, Rizzi A. Cancer diagnosis using deep learning: a bibliographic review. Cancers. 2019; 11(9):1-36.
[Crossref] [Google Scholar]
[13]Andriasyan V, Yakimovich A, Georgi F, Petkidis A, Witte R, Puntener D, et al. Deep learning of virus infections reveals mechanics of lytic cells. BioRxiv. 2019:1-18.
[Crossref] [Google Scholar]
[14]Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020; 577:706-10.
[Crossref] [Google Scholar]
[15]Petrosino A, Loia V, Pedrycz W. Fuzzy logic and soft computing applications. 11th international workshop, WILF 2016; 2017.
[Google Scholar]
[16]Anggraeni W, Nandika D, Mahananto F, Sudiarti Y, Fadhilla CA. Diphtheria case number forecasting using radial basis function neural network. In international conference on informatics and computational sciences 2019 (pp. 1-6). IEEE.
[Crossref] [Google Scholar]
[17]Park D, Kim BH, Lee SE, Kim DY, Kim M, Kwon HD, et al. Machine learning-based approach for disease severity classification of carpal tunnel syndrome. Scientific Reports. 2021; 11(1):1-10.
[Crossref] [Google Scholar]
[18]Zhang Y, Kambhampati C, Davis DN, Goode K, Cleland JG. A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In 9th international conference on fuzzy systems and knowledge discovery 2012 (pp. 2840-4). IEEE.
[Crossref] [Google Scholar]
[19]Diri B, Albayrak S. Visualization and analysis of classifiers performance in multi-class medical data. Expert Systems with Applications. 2008; 34(1):628-34.
[Crossref] [Google Scholar]
[20]Mohan S, Thirumalai C, Srivastava G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access. 2019; 7:81542-54.
[Crossref] [Google Scholar]
[21]Chaudhary A, Kolhe S, Kamal R. An improved random forest classifier for multi-class classification. Information Processing in Agriculture. 2016; 3(4):215-22.
[Crossref] [Google Scholar]
[22]Jacob SG, Ramani RG. Discovery of knowledge patterns in clinical data through data mining algorithms: multi-class categorization of breast tissue data. International Journal of Computer Applications. 2011; 32(7):46-53.
[Google Scholar]
[23]Altaf T, Anwar SM, Gul N, Majeed MN, Majid M. Multi-class Alzheimers disease classification using image and clinical features. Biomedical Signal Processing and Control. 2018; 43:64-74.
[Crossref] [Google Scholar]
[24]Iqbal N, Islam M. Machine learning for Dengue outbreak prediction: an outlook. International Journal of Advanced Research in Computer Science. 2017; 8(1):93-102.
[Google Scholar]
[25]Yang R, Man S. Improved text feature selection algorithms in classification search of environmental protection information. Journal of Environmental Protection and Ecology. 2019; 20(3):1462-9.
[Google Scholar]
[26]Uçar T, Karahoca A, Karahoca D. Tuberculosis disease diagnosis by using adaptive neuro fuzzy inference system and rough sets. Neural Computing and Applications. 2013; 23(2):471-83.
[Crossref] [Google Scholar]
[27]Fariza A, Jalilah H, Basofi A. Spatial mapping and prediction of diphtheria risk in surabaya, Indonesia, using the hierarchical clustering algorithm. In proceedings of the 1st international conference on electronics, biomedical engineering, and health informatics 2021 (pp. 251-68). Springer, Singapore.
[Crossref] [Google Scholar]
[28]Singh SP, Karkare S, Baswan SM, Singh VP. Agglomerative hierarchical clustering analysis of co/multi-morbidities. arXiv preprint arXiv:1807.04325. 2018.
[Google Scholar]
[29]Fatoni CS, Utami E, Wibowo FW. Expert system for diagnosing diphtheria with k-nearest neighbor method. International Journal Artificial Intelligent and Informatics. 2018; 1(2):45-56.
[Google Scholar]
[30]Chumachenko D, Meniailov I, Bazilevych K, Chukhray A. Intelligent multiagent approach to diphtheria infection epidemic process simulation. In Ukraine conference on electrical and computer engineering 2019 (pp. 833-6). IEEE.
[Crossref] [Google Scholar]
[31]MOH, https://moh.gov.ye/en/home.aspx, Accessed 29 March 2022.
[32]Kuhn M, Johnson K. Feature engineering and selection: a practical approach for predictive models. CRC Press; 2019.
[Google Scholar]
[33]Malik S, Harous S, El-sayed H. Comparative analysis of machine learning algorithms for early prediction of diabetes mellitus in women. In international symposium on modelling and implementation of complex systems 2020 (pp. 95-106). Springer, Cham.
[Crossref] [Google Scholar]
[34]Batista GE, Prati RC, Monard MC. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter. 2004; 6(1):20-9.
[Crossref] [Google Scholar]
[35]Zhang B, Lu L, Hou J. A comparison of logistic regression, random forest models in predicting the risk of diabetes. In proceedings of the third international symposium on image computing and digital medicine 2019 (pp. 231-4).
[Crossref] [Google Scholar]
[36]Lino FDSBMH, Oliveira AG, Morais FSL, Da SRE, Lorenzato DOlJF, Lynn T, et al. Benchmarking machine learning models to assist in the prognosis of tuberculosis. Informatics 2021: 8(2):1-17. Multidisciplinary Digital Publishing Institute.
[Crossref] [Google Scholar]
[37]Sharaff A, Gupta H. Extra-tree classifier with metaheuristics approach for email classification. In advances in computer communication and computational sciences 2019 (pp. 189-97). Springer, Singapore.
[Crossref] [Google Scholar]
[38]Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining 2016 (pp. 785-94).
[Crossref] [Google Scholar]
[39]Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY. Lightgbm: a highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems. 2017.
[Google Scholar]
[40]Zhu C, Idemudia CU, Feng W. Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Informatics in Medicine Unlocked. 2019; 17:100179.
[Crossref] [Google Scholar]
[41]Vapnik V. The nature of statistical learning theory. Springer Science & Business Media; 1999.
[Google Scholar]
[42]Soumaya Z, Taoufiq BD, Benayad N, Yunus K, Abdelkrim A. The detection of Parkinson disease using the genetic algorithm and SVM classifier. Applied Acoustics. 2021.
[Crossref] [Google Scholar]
[43]Kumar A, Das S, Tyagi V, Shaw RN, Ghosh A. Analysis of classifier algorithms to detect anti-money laundering. In computationally intelligent systems and their applications 2021 (pp. 143-52). Springer, Singapore.
[Crossref] [Google Scholar]
[44]Ladić T, Mandekić A. Face mask classification using MLP Classifier. Ri-STEM-2021. 2021; 10(68):77.
[Google Scholar]
[45]Abdualgalil B, Abraham S. Applications of machine learning algorithms and performance comparison: A review. In international conference on emerging trends in information technology and engineering 2020 (pp. 1-6). IEEE.
[Crossref] [Google Scholar]
[46]Grandini M, Bagli E, Visani G. Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756. 2020.
[Google Scholar]
[47]Li H, Jiao R, Fan J. Precision of multi-class classification methods for support vector machines. In 9th international conference on signal processing 2008 (pp. 1516-9). IEEE.
[Crossref] [Google Scholar]
[48]Altuve M, Alvarez AJ, Severeyn E. Multiclass classification of metabolic conditions using fasting plasma levels of glucose and insulin. Health and Technology. 2021; 11(4):953-62.
[Crossref] [Google Scholar]
[49]Hassan MR, Huda S, Hassan MM, Abawajy J, Alsanad A, Fortino G. Early detection of cardiovascular autonomic neuropathy: a multi-class classification model based on feature selection and deep learning feature fusion. Information Fusion. 2022; 77:70-80.
[Crossref] [Google Scholar]
[50]Mary-huard T, Perduca V, Martin-magniette ML, Blanchard G. Error rate control for classification rules in multiclass mixture models. The International Journal of Biostatistics. 2021.
[Crossref] [Google Scholar]
[51]https://sci2s.ugr.es/keel/index.php. Accessed 28 December 2021.