(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Computer Research (IJACR)

ISSN (Print):2249-7277    ISSN (Online):2277-7970
Volume-10 Issue-49 July-2020
Full-Text PDF
Paper Title : Predictive and perspective analysis of cancer image data set using machine learning algorithms
Author Name : Divya Chauhan and Kishori Lal Bansal
Abstract :

Classification and prediction of the images are fairly easy task for humans, but it takes more effort for a machine to do the same. Machine learning helps to attain this goal. It automates the task of classifying a large collection of images into different classes by labelling the incoming data and recognizes patterns in it, which is subsequently translated into valuable insights. The aim of this paper is to classify the image data set of five cancer types, namely Osteosarcoma, Prostate Cancer, Brain Cancer, Breast Cancer and Acute Myeloid Leukaemia. Furthermore, the prediction of Osteosarcoma case for one of the four classes of tumor namely Non tumor, Non-Viable tumor, viable tumor, Viable: Non-Viable tumor has to be done. The quantitative analysis is done using various machine learning libraries of python. The three classification algorithms used for image analysis are random forest, SVM, and logistic regression. The metrics used for performing perspective analysis are precision, recall and F1 Score. The results show that the random forest algorithm has performed best amongst the three classification algorithms when given with less complicated scenario, with prediction accuracy, precision, recall and f1 score of 100%. But the performance of every classification algorithm degrades when provided with the cases of Osteosarcoma which has got more complicated scatter graph. However, the logistic regression retains its performance by predicting tumor cases with 99% accuracy.

Keywords : Data mining, Big data, Hadoop, Mahout, Clustering, Health care.
Cite this article : Chauhan D, Bansal KL. Predictive and perspective analysis of cancer image data set using machine learning algorithms. International Journal of Advanced Computer Research. 2020; 10(49):161-170. DOI:10.19101/IJACR.2020.1048064.
References :
[1]https://searchbusinessanalytics.techtarget.com/ehandbook/Machine-learning-technology-techniques-add-new-analytics-smarts. Accessed 11 April 2020.
[2]Asim M, Khan Z. Mobile price class prediction using machine learning techniques. International Journal of Computer Applications. 2018;179(29):6-11.
[Google Scholar]
[3]https://towardsdatascience.com/a-brief-introduction-to-supervised-learning-54a3e3932590. Accessed 11 April 2020.
[4]Kesavaraj G, Sukumaran S. A study on classification techniques in data mining. In fourth international conference on computing, communications and networking technologies 2013 (pp. 1-7). IEEE.
[Crossref] [Google Scholar]
[5]Korkmaz M, Güney S, Yiğiter ŞY. The importance of logistic regression implementations in the Turkish livestock sector and logistic regression implementations/fields. 2012; 16(2):25-36.
[Google Scholar]
[6]Biau G. Analysis of a random forests model. The Journal of Machine Learning Research. 2012; 13(1):1063-95.
[Google Scholar]
[7]Tong S, Koller D. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research. 2001:45-66.
[Google Scholar]
[8]Nadiammai GV, Hemalatha M. Perspective analysis of machine learning algorithms for detecting network intrusions. In third international conference on computing, communication and networking technologies 2012 (pp. 1-7). IEEE.
[Crossref] [Google Scholar]
[9]Liu S, Wang X, Liu M, Zhu J. Towards better analysis of machine learning models: a visual analytics perspective. Visual Informatics. 2017; 1(1):48-56.
[Crossref] [Google Scholar]
[10]Khatavkar V, Velankar M, Kulkarni P. Multi-perspective analysis of news articles using machine learning algorithms. International Journal of Computer Applications.2019.
[Google Scholar]
[11]Celli F, Cumbo F, Weitschek E. Classification of large DNA methylation datasets for identifying cancer drivers. Big Data Research. 2018; 13:21-8.
[Crossref] [Google Scholar]
[12]Khalifa S, Martin P, Young R. Label-aware distributed ensemble learning: a simplified distributed classifier training model for big data. Big Data Research. 2019; 15:1-11.
[Crossref] [Google Scholar]
[13]Genevès P, Calmant T, Layaïda N, Lepelley M, Artemova S, Bosson JL. Scalable machine learning for predicting at-risk profiles upon hospital admission. Big Data Research. 2018; 12:23-34.
[Crossref] [Google Scholar]
[14]Sun N, Sun B, Lin JD, Wu MY. Lossless pruned naive bayes for big data classifications. Big Data Research. 2018; 14:27-36.
[Crossref] [Google Scholar]
[15]McGinnis RS, McGinnis EW, Hruschak J, Lopez-Duran NL, Fitzgerald K, Rosenblum KL, et al. Wearable sensors and machine learning diagnose anxiety and depression in young children. In EMBS international conference on biomedical & health informatics (BHI) 2018 (pp. 410-3). IEEE.
[Crossref] [Google Scholar]
[16]Dumitrescu E, Hue S, Hurlin C, Tokpavi S. Machine learning for credit scoring: improving logistic regression with non linear decision tree effects (Doctoral dissertation). 2018.
[Google Scholar]
[17]Xin M, Wang Y. Research on image classification model based on deep convolution neural network. EURASIP Journal on Image and Video Processing. 2019.
[Crossref] [Google Scholar]
[18]Gupta A. Current research opportunities of image processing and computer vision. Computer Science. 2019; 20(4):387-410.
[Google Scholar]
[19]Bianco S, Cusano C, Piccoli F, Schettini R. Personalized image enhancement using neural spline color transforms. IEEE Transactions on Image Processing. 2020; 29:6223-36.
[Crossref] [Google Scholar]
[20]Liu CL, Shih KT, Huang JW, Chen HH. Light field synthesis by training deep network in the refocused image domain. IEEE Transactions on Image Processing. 2020; 29:6630-40.
[Crossref] [Google Scholar]
[21]Liu S, Thung KH, Lin W, Yap PT, Shen D. Real-time quality assessment of pediatric MRI via semi-supervised deep nonlocal residual neural networks. IEEE Transactions on Image Processing. 2020; 29:7697-706.
[Crossref] [Google Scholar]
[22]Yasarla R, Perazzi F, Patel VM. Deblurring face images using uncertainty guided multi-stream semantic networks. IEEE Transactions on Image Processing. 2020; 29:6251-63.
[Crossref] [Google Scholar]
[23]Mishra A. Metrics to evaluate your machine learning algorithm. Towards Data Science. 2018.
[Google Scholar]