(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Technology and Engineering Exploration (IJATEE)

ISSN (Print):2394-5443    ISSN (Online):2394-7454
Volume-9 Issue-87 February-2022
Full-Text PDF
Paper Title : Machine learning techniques with ANOVA for the prediction of breast cancer
Author Name : Bharti Thakur, Nagesh Kumar and Gaurav Gupta
Abstract :

Breast cancer is one of the most common cancer among females. In this paper, machine learning techniques are applied to a molecular taxonomy of breast cancer international consortium (METABRIC) dataset to extract prime clinical attributes. Analysis of variance (ANOVA), is used for clinical feature selection. Five different machine learning algorithms are implemented, which are support vector machine (SVM), decision tree, random forest, AdaBoost and artificial neural network (ANN). Among all the machine learning classifiers, ANN gives the highest accuracy of 87.43%. This statistical technique is helpful for the detection of breast cancer, and it will increase the survival rate of females.

Keywords : Breast cancer, Genes, ANOVA, ANN, SVM, Machine learning, Healthcare.
Cite this article : Thakur B, Kumar N, Gupta G. Machine learning techniques with ANOVA for the prediction of breast cancer. International Journal of Advanced Technology and Engineering Exploration. 2022; 9(87):232-245. DOI:10.19101/IJATEE.2021.874555.
References :
[1]Priyanka KS. A review paper on breast cancer detection using deep learning. In conference series: materials science and engineering 2021 (p. 012071). IOP Publishing.
[Crossref] [Google Scholar]
[2]Lukong KE. Understanding breast cancer–the long and winding road. BBA Clinical. 2017; 7:64-77.
[Crossref] [Google Scholar]
[3]Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians. 2021; 71(3):209-49.
[Crossref] [Google Scholar]
[4]Mahdi KM, Nassiri MR, Nasiri K. Hereditary genes and SNPs associated with breast cancer. Asian Pacific Journal of Cancer Prevention. 2013; 14(6):3403-9.
[Crossref] [Google Scholar]
[5]Shiovitz S, Korde LA. Genetics of breast cancer: a topic in evolution. Annals of Oncology. 2015; 26(7):1291-9.
[Crossref] [Google Scholar]
[6]Gupta A, Shridhar K, Dhillon PK. A review of breast cancer awareness among women in India: cancer literate or awareness deficit?. European Journal of Cancer. 2015; 51(14):2058-66.
[Crossref] [Google Scholar]
[7]Pyingkodi M, Thangarajan R. Informative gene selection for cancer classification with microarray data using a metaheuristic framework. Asian Pacific Journal of Cancer Prevention: Asian Pacific Journal of Cancer Prevention. 2018; 19(2):561-4.
[Crossref] [Google Scholar]
[8]Sun Y, Zhu S, Ma K, Liu W, Yue Y, Hu G, Lu H, Chen W. Identification of 12 cancer types through genome deep learning. Scientific Reports. 2019; 9(1):1-9.
[Google Scholar]
[9]El RSA, Al-montasheri A, Al-hazmi B, Al-dkaan H, Al-shehri M. Machine learning model for breast cancer prediction. In international conference on fourth industrial revolution 2019 (pp. 1-8). IEEE.
[Crossref] [Google Scholar]
[10]Le NQ, Yapp EK, Nagasundaram N, Yeh HY. Classifying promoters by interpreting the hidden information of DNA sequences via deep learning and combination of continuous FastText N-grams. Frontiers in Bioengineering and Biotechnology. 2019:1-9.
[Crossref] [Google Scholar]
[11]Carbonell JG, Michalski RS, Mitchell TM. An overview of machine learning. Machine Learning. 1983:3-23.
[Crossref] [Google Scholar]
[12]Vaka AR, Soni B, Reddy S. Breast cancer detection by leveraging machine learning. ICT Express. 2020; 6(4):320-4.
[Crossref] [Google Scholar]
[13]Malvia S, Bagadi SA, Dubey US, Saxena S. Epidemiology of breast cancer in Indian women. Asia‐Pacific Journal of Clinical Oncology. 2017; 13(4):289-95.
[Crossref] [Google Scholar]
[14]Momenimovahed Z, Salehiniya H. Epidemiological characteristics of and risk factors for breast cancer in the world. Breast Cancer: Targets and Therapy. 2019:151-64.
[Crossref] [Google Scholar]
[15]Oeffinger KC, Fontham ET, Etzioni R, Herzig A, Michaelson JS, Shih YC, et al. Breast cancer screening for women at average risk: 2015 guideline update from the American cancer society. JAMA. 2015; 314(15):1599-614.
[Google Scholar]
[16]Gupta P, Garg S. Breast cancer prediction using varying parameters of machine learning models. Procedia Computer Science. 2020; 171:593-601.
[Crossref] [Google Scholar]
[17]Feng Y, Spezia M, Huang S, Yuan C, Zeng Z, Zhang L, et al. Breast cancer development and progression: Risk factors, cancer stem cells, signaling pathways, genomics, and molecular pathogenesis. Genes & Diseases. 2018; 5(2):77-106.
[Crossref] [Google Scholar]
[18]Musumeci F, Rottondi C, Nag A, Macaluso I, Zibar D, Ruffini M, et al. An overview on application of machine learning techniques in optical networks. IEEE Communications Surveys & Tutorials. 2018; 21(2):1383-408.
[Crossref] [Google Scholar]
[19]Kothari C, Osseni MA, Agbo L, Ouellette G, Déraspe M, Laviolette F, et al. Machine learning analysis identifies genes differentiating triple negative breast cancers. Scientific Reports. 2020; 10(1):1-5.
[Google Scholar]
[20]Mirsadeghi L, Haji HR, Banaei-moghaddam AM, Kavousi K. EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer. BMC Medical Genomics. 2021; 14(1):1-19.
[Crossref] [Google Scholar]
[21]Amrane M, Oukid S, Gagaoua I, Ensari T. Breast cancer classification using machine learning. In electric electronics, computer science, biomedical engineerings meeting 2018 (pp. 1-4). IEEE.
[Crossref] [Google Scholar]
[22]Wu J, Hicks C. Breast cancer type classification using machine learning. Journal of Personalized Medicine. 2021; 11(2):1-12.
[Crossref] [Google Scholar]
[23]Divyavani M, Kalpana G. An analysis on SVM & ANN using breast cancer dataset. Aegaeum J. 2021; 8:369-79.
[Google Scholar]
[24]Ak MF. A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications. Healthcare 2020; 8(2):1-23. Multidisciplinary Digital Publishing Institute.
[Crossref] [Google Scholar]
[25]Thottathyl H, Kanadam KP, Panchadula RP. Microarray breast cancer data clustering using map reduce based K-means algorithm. Revue dIntelligence Artificielle. 2020; 34(6):763-9.
[Crossref] [Google Scholar]
[26]Ahmed MT, Imtiaz MN, Karmakar A. Analysis of wisconsin breast cancer original dataset using data mining and machine learning algorithms for breast cancer prediction. Journal of Science Technology and Environment Informatics. 2020; 9(2):665-72.
[Crossref] [Google Scholar]
[27]Teixeira F, Montenegro JL, Da CCA, Da RRR. An analysis of machine learning classifiers in breast cancer diagnosis. In XLV Latin American computing conference 2019 (pp. 1-10). IEEE.
[Crossref] [Google Scholar]
[28]Magboo VP, Magboo MS. Machine learning classifiers on breast cancer recurrences. Procedia Computer Science. 2021; 192:2742-52.
[Crossref] [Google Scholar]
[29]Naji MA, El FS, Aarika K, Benlahmar EH, Abdelouhahid RA, Debauche O. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Computer Science. 2021; 191:487-92.
[Crossref] [Google Scholar]
[30]Lahoura V, Singh H, Aggarwal A, Sharma B, Mohammed MA, Damaševičius R, et al. Cloud computing-based framework for breast cancer diagnosis using extreme learning machine. Diagnostics. 2021; 11(2):1-19.
[Crossref] [Google Scholar]
[31]Ali HR, Rueda OM, Chin SF, Curtis C, Dunning MJ, Aparicio SA, et al. Genome-driven integrated classification of breast cancer validated in over 7,500 samples. Genome Biology. 2014; 15(8):1-14.
[Crossref] [Google Scholar]
[32]Saoud H, Ghadi A, Ghailani M, Abdelhakim BA. Using feature selection techniques to improve the accuracy of breast cancer classification. In the proceedings of the third international conference on smart city applications 2018 (pp. 307-15). Springer, Cham.
[Crossref] [Google Scholar]
[33]Vrigazova BP. Detection of malignant and benign breast cancer using the Anova-Bootstrap-SVM. Journal of Data and Information Science. 2020; 5(2):62-75.
[Crossref] [Google Scholar]
[34]Abdullah DM, Abdulazeez AM. Machine learning applications based on SVM classification a review. Qubahan Academic Journal. 2021; 1(2):81-90.
[Crossref] [Google Scholar]
[35]Charbuty B, Abdulazeez A. Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends. 2021; 2(01):20-8.
[Crossref] [Google Scholar]
[36]Chang CC, Yeh JH, Chiu HC, Chen YM, Jhou MJ, Liu TC, et al. Utilization of decision tree algorithms for supporting the prediction of intensive care unit admission of myasthenia gravis: a machine learning-based approach. Journal of Personalized Medicine. 2022; 12(1):1-16.
[Crossref] [Google Scholar]
[37]Disha RA, Waheed S. Performance analysis of machine learning models for intrusion detection system using Gini impurity-based weighted random forest (GIWRF) feature selection technique. Cybersecurity. 2022; 5(1):1-22.
[Crossref] [Google Scholar]
[38]Schonlau M, Zou RY. The random forest algorithm for statistical learning. The Stata Journal. 2020; 20(1):3-29.
[Crossref] [Google Scholar]
[39]Gaye B, Zhang D, Wulamu A. Improvement of support vector machine algorithm in big data background. Mathematical Problems in Engineering. 2021.
[Crossref] [Google Scholar]
[40]Gulati P, Sharma A, Gupta M. Theoretical study of decision tree algorithms to identify pivotal factors for performance improvement: a review. International Journal of Computer Applications. 2016; 141(14):19-25.
[Google Scholar]
[41]Sarica A, Cerasa A, Quattrone A. Random forest algorithm for the classification of neuroimaging data in Alzheimers disease: a systematic review. Frontiers in Aging Neuroscience. 2017; 9:1-12.
[Crossref] [Google Scholar]
[42]Zhang Y, Ni M, Zhang C, Liang S, Fang S, Li R, et al. Research and application of AdaBoost algorithm based on SVM. In 8th joint international information technology and artificial intelligence conference 2019 (pp. 662-6). IEEE.
[Crossref] [Google Scholar]
[43]Montavon G, Samek W, Müller KR. Methods for interpreting and understanding deep neural networks. Digital Signal Processing. 2018; 73:1-15.
[Crossref] [Google Scholar]
[44]Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Computer Science. 2021; 2(3):1-21.
[Crossref] [Google Scholar]
[45]Battula K. Research of machine learning algorithms using K-fold cross validation. International Journal of Engineering and Advanced Technology. 2021; 8(6S):215-8.
[Google Scholar]
[46]Kumar A, Sushil R, Tiwari AK. Significance of accuracy levels in cancer prediction using machine learning techniques. Technical Communication. 2019; 12(3): 741-7.
[Crossref] [Google Scholar]
[47]Patel HH, Prajapati P. Study and analysis of decision tree based classification algorithms. International Journal of Computer Sciences and Engineering. 2018; 6(10):74-8.
[Google Scholar]
[48]Octaviani TL, Rustam DZ. Random forest for breast cancer prediction. In conference proceedings 2019 (pp. 1-6). AIP Publishing LLC.
[Crossref] [Google Scholar]
[49]Zheng J, Lin D, Gao Z, Wang S, He M, Fan J. Deep learning assisted efficient AdaBoost algorithm for breast cancer detection and early diagnosis. IEEE Access. 2020; 8:96946-54.
[Crossref] [Google Scholar]
[50]Mohammed SA, Darrab S, Noaman SA, Saake G. Analysis of breast cancer detection using different machine learning techniques. In international conference on data mining and big data 2020 (pp. 108-17). Springer, Singapore.
[Crossref] [Google Scholar]
[51]Easttom C, Thapa S, Lawson J. A comparative study of machine learning algorithms for use in breast cancer studies. In 10th annual computing and communication workshop and conference 2020 (pp. 412-6). IEEE.
[Crossref] [Google Scholar]
[52]Chaurasia V, Pal S, Tiwari BB. Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology. 2018; 12(2):119-26.
[Crossref] [Google Scholar]