(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Technology and Engineering Exploration (IJATEE)

ISSN (Print):2394-5443    ISSN (Online):2394-7454
Volume-10 Issue-103 June-2023
Full-Text PDF
Paper Title : Performance evaluation of classifiers for the COVID-19 symptom-based dataset using different feature selection methods
Author Name : Fauzan Iliya Khalid, Mokhairi Makhtar, Rosaida Rosly and Aceng Sambas
Abstract :

Classification algorithms are commonly employed in healthcare systems to aid decision support processes, such as treatment regimens, diagnosis, and illness prediction. The recent emergence of dominant variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), widely known as the coronavirus disease (COVID-19), has emphasized the significance of early detection for ensuring appropriate treatment and protecting unaffected populations. This study assesses the performance of various classification models on a COVID-19 dataset, utilizing two distinct feature selection methods: the wrapper method (WrapperSubsetEval) and the correlation-based feature subset evaluation (CfsSubsetEval). The effectiveness of these methods is evaluated based on the number of features selected for the reduced subset, execution time, and classifier accuracy. The experimentation is conducted using WEKA tools, and five different classifiers are selected for computation and comparison of accuracy: J48 decision tree (DT), support vector machine (SVM), naïve Bayes (NB), sequential minimal optimization (SMO), and k-nearest neighbor (KNN). The performance of each model is assessed using a 10-fold cross-validation technique, and the accuracy of the models is measured. The evaluation results, including comparisons before and after the implementation of the classification process and feature selection methods, indicate that KNN employing WrapperSubsetEval+KNN outperforms other algorithms, achieving the highest accuracy of 98.81%. In summary, the utilization of feature selection methods can be considered an effective approach for COVID-19 prediction.

Keywords : Classification, Machine learning, Feature selection, COVID-19.
Cite this article : Khalid FI, Makhtar M, Rosly R, Sambas A. Performance evaluation of classifiers for the COVID-19 symptom-based dataset using different feature selection methods. International Journal of Advanced Technology and Engineering Exploration. 2023; 10(103):741-761. DOI:10.19101/IJATEE.2023.10101228.
References :
[1]Podder P, Mondal MR. Machine learning to predict COVID-19 and ICU requirement. In 11th international conference on electrical and computer engineering 2020 (pp. 483-6). IEEE.
[Crossref] [Google Scholar]
[2]Silahudin D, Holidin A. Model expert system for diagnosis of covid-19 using naïve Bayes classifier. In IOP conference series: materials science and engineering 2020 (pp. 1-7). IOP Publishing.
[Crossref] [Google Scholar]
[3]Muhammad LJ, Algehyne EA, Usman SS, Ahmad A, Chakraborty C, Mohammed IA. Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN Computer Science. 2021; 2:1-3.
[Crossref] [Google Scholar]
[4]Shanmugam SK. A study on the performance of classification models for COVID-19 datasets. Turkish Journal of Computer and Mathematics Education. 2021; 12(10):1123-7.
[Google Scholar]
[5]Jain D, Singh V. Feature selection and classification systems for chronic disease prediction: a review. Egyptian Informatics Journal. 2018; 19(3):179-89.
[Crossref] [Google Scholar]
[6]Rasheed J, Hameed AA, Djeddi C, Jamil A, Al-turjman F. A machine learning-based framework for diagnosis of COVID-19 from chest X-ray images. Interdisciplinary Sciences: Computational Life Sciences. 2021; 13:103-17.
[Crossref] [Google Scholar]
[7]Rahman MM, Usman OL, Muniyandi RC, Sahran S, Mohamed S, Razak RA. A review of machine learning methods of feature selection and classification for autism spectrum disorder. Brain Sciences. 2020; 10(12):949.
[Crossref] [Google Scholar]
[8]Al JKB, Kadhim R. Data reduction techniques: a comparative study for attribute selection methods. International Journal of Advanced Computer Science and Technology. 2018; 8(1):1-13.
[Google Scholar]
[9]Venkatesh B, Anuradha J. A review of feature selection and its methods. Cybernetics and Information Technologies. 2019; 19(1):3-26.
[Crossref] [Google Scholar]
[10]Richhariya B, Tanveer M, Rashid AH. Alzheimer’s disease neuroimaging initiative diagnosis of Alzheimer’s disease using universum support vector machine based recursive feature elimination (USVM-RFE). Biomedical Signal Processing and Control. 2020; 59:101903.
[Crossref] [Google Scholar]
[11]Senan EM, Al-adhaileh MH, Alsaade FW, Aldhyani TH, Alqarni AA, Alsharif N, et al. Diagnosis of chronic kidney disease using effective classification algorithms and recursive feature elimination techniques. Journal of Healthcare Engineering. 2021; 2021:1-10.
[Crossref] [Google Scholar]
[12]Gnanambal S, Thangaraj M, Meenatchi VT, Gayathri V. Classification algorithms with attribute selection: an evaluation study using WEKA. International Journal of Advanced Networking and Applications. 2018; 9(6):3640-4.
[Google Scholar]
[13]Elgamal ZM, Yasin NB, Tubishat M, Alswaitti M, Mirjalili S. An improved Harris hawks optimization algorithm with simulated annealing for feature selection in the medical field. IEEE Access. 2020; 8:186638-52.
[Crossref] [Google Scholar]
[14]Gárate-escamila AK, El HAH, Andrès E. Classification models for heart disease prediction using feature selection and PCA. Informatics in Medicine Unlocked. 2020; 19:1-13.
[Crossref] [Google Scholar]
[15]Zaini NA, Awang MK. Hybrid feature selection algorithm and ensemble stacking for heart disease prediction. International Journal of Advanced Computer Science and Applications. 2023; 14(2):158-65.
[Crossref] [Google Scholar]
[16]Wah YB, Ibrahim N, Hamid HA, Abdul-rahman S, Fong S. Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika Journal of Science & Technology. 2018; 26(1):329-40.
[Google Scholar]
[17]Alaika L, Alamsyah A. Optimization of accuracy to autism spectrum disorder identification for children using support vector machine and correlation-based feature selection. Journal of Advances in Information Systems and Technology. 2022; 4(1):1-2.
[Crossref] [Google Scholar]
[18]Reddy KV, Elamvazuthi I, Abd AA, Paramasivam S, Chua HN, Pranavanand S. Prediction of heart disease risk using machine learning with correlation-based feature selection and optimization techniques. In 7th international conference on signal processing and communication 2021 (pp. 228-33). IEEE.
[Crossref] [Google Scholar]
[19]Kar M, Dewangan L. Classification of epileptic EEG signals based on J48 classifier and correlation based feature selection. International Journal for Research in Applied Science & Engineering Technology. 2018; 6:2557–60.
[Google Scholar]
[20]Khaniabadi PM, Bouchareb Y, Al-dhuhli H, Shiri I, Al-kindi F, Khaniabadi BM, et al. Two-step machine learning to diagnose and predict involvement of lungs in COVID-19 and pneumonia using CT radiomics. Computers in Biology and Medicine. 2022; 150:106165.
[Crossref] [Google Scholar]
[21]Effrosynidis D, Arampatzis A. An evaluation of feature selection methods for environmental data. Ecological Informatics. 2021; 61:101224.
[Crossref] [Google Scholar]
[22]Zhang R, Nie F, Li X, Wei X. Feature selection with multi-view data: a survey. Information Fusion. 2019; 50:158-67.
[Crossref] [Google Scholar]
[23]Omuya EO, Okeyo GO, Kimwele MW. Feature selection for classification using principal component analysis and information gain. Expert Systems with Applications. 2021; 174:114765.
[Crossref] [Google Scholar]
[24]Almugren N, Alshamlan H. A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access. 2019; 7:78533-48.
[Crossref] [Google Scholar]
[25]Shaban WM, Rabie AH, Saleh AI, Abo-elsoud MA. A new COVID-19 patients detection strategy (CPDS) based on hybrid feature selection and enhanced KNN classifier. Knowledge-Based Systems. 2020; 205:106270.
[Crossref] [Google Scholar]
[26]Torse DA, Khanai R, Pai K, Iyer S, Mavinkattimath S, Kallimani R, et al. Optimal feature selection for COVID-19 detection with CT images enabled by metaheuristic optimization and artificial intelligence. Multimedia Tools and Applications. 2023:1-31.
[Crossref] [Google Scholar]
[27]Danacı Ç, Tuncer SA. Incorporating feature selection methods into machine learning-based covid-19 diagnosis. Applied Computer Systems. 2022; 27(1):13-8.
[Crossref] [Google Scholar]
[28]Hayet-otero M, García-garcía F, Lee DJ, Martínez-minaya J, España VPP, Urrutia LI, et al. Extracting relevant predictive variables for COVID-19 severity prognosis: an exhaustive comparison of feature selection techniques. Plos One. 2023; 18(4):e0284150.
[Crossref] [Google Scholar]
[29]Ali RH, Abdulsalam WH. The prediction of covid 19 disease using feature selection techniques. In journal of physics: conference series 2021 (1-12). IOP Publishing.
[Crossref] [Google Scholar]
[30]Yusuf R. Comparing different supervised machine learning accuracy on analyzing COVID-19 data using ANOVA test. In 6th international conference on interactive digital media 2020 (pp. 1-6). IEEE.
[Crossref] [Google Scholar]
[31]Varzaneh ZA, Orooji A, Erfannia L, Shanbehzadeh M. A new COVID-19 intubation prediction strategy using an intelligent feature selection and K-NN method. Informatics in Medicine Unlocked. 2022; 28:100825.
[Crossref] [Google Scholar]
[32]Mohammad MA, Aljabri M, Aboulnour M, Mirza S, Alshobaiki A. Classifying the mortality of people with underlying health conditions affected by COVID-19 using machine learning techniques. Applied Computational Intelligence and Soft Computing. 2022; 2022:1-12.
[Crossref] [Google Scholar]
[33]Sardar R, Sharma A, Gupta D. Machine learning assisted prediction of prognostic biomarkers associated with COVID-19, using clinical and proteomics data. Frontiers in Genetics. 2021; 12:636441.
[Crossref] [Google Scholar]
[34]Palattao CA, Solano GA, Tee CA, Tee ML. Determining factors contributing to the psychological impact of the COVID-19 pandemic using machine learning. In international conference on artificial intelligence in information and communication 2021 (pp. 219-24). IEEE.
[Crossref] [Google Scholar]
[35]Mahdi AY, Yuhaniz SS. Optimal feature selection using novel flamingo search algorithm for classification of COVID-19 patients from clinical text. Mathematical Biosciences and Engineering. 2023; 20(3):5268-97.
[Crossref] [Google Scholar]
[36]Ranganathan G. A study to find facts behind preprocessing on deep learning algorithms. Journal of Innovative Image Processing. 2021; 3(1):66-74.
[Crossref] [Google Scholar]
[37]Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. Journal of Engineering and Applied Sciences. 2017; 12(16):4102-7.
[Google Scholar]
[38]Jain N, Jhunthra S, Garg H, Gupta V, Mohan S, Ahmadian A, et al. Prediction modelling of COVID using machine learning methods from B-cell dataset. Results in Physics. 2021; 21:103813.
[Crossref] [Google Scholar]
[39]Usman MM, Owolabi O, Ajibola AA. Feature selection: it importance in performance prediction. IJESC. 2020:25625-32.
[Google Scholar]
[40]Shaikh TA, Ali R. Applying machine learning algorithms for early diagnosis and prediction of breast cancer risk. In proceedings of 2nd international conference on communication, computing and networking 2019 (pp. 589-98). Springer Singapore.
[Crossref] [Google Scholar]
[41]Cornforth D, Jelinek H, Teich M, Lowen S. Wrapper subset evaluation facilitates the automated detection of diabetes from heart rate variability measures. In international conference on computational intelligence for modelling, control and automation 2004 (pp. 446-55). University of Canberra.
[Google Scholar]
[42]Gonçalves VP, Ribeiro EA, Imai NN. Mapping areas invaded by pinus sp. from geographic object-based image analysis (GEOBIA) applied on RPAS (Drone) color images. Remote Sensing. 2022; 14(12):2805.
[Crossref] [Google Scholar]
[43]Mishra S, Mallick PK, Tripathy HK, Bhoi AK, González-briones A. Performance evaluation of a proposed machine learning model for chronic disease datasets using an integrated attribute evaluator and an improved decision tree classifier. Applied Sciences. 2020; 10(22):8137.
[Crossref] [Google Scholar]
[44]Nedeva V, Pehlivanova T. Students’ performance analyses using machine learning algorithms in WEKA. In IOP conference series: materials science and engineering 2021 (pp. 1-13). IOP Publishing.
[Crossref] [Google Scholar]
[45]Biswas S, Bordoloi M, Purkayastha B. Review on feature selection and classification using neuro-fuzzy approaches. International Journal of Applied Evolutionary Computation. 2016; 7(4):28-44.
[Crossref] [Google Scholar]
[46]Marcot BG, Hanea AM. What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis? Computational Statistics. 2021; 36(3):2009-31.
[Crossref] [Google Scholar]
[47]Aljohani A. Machine learning techniques for COVID-19 detection: a comparative analysis. International Journal of Computer and Information Engineering. 2022; 16(12):592-7.
[Google Scholar]