(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Technology and Engineering Exploration (IJATEE)

ISSN (Print):2394-5443    ISSN (Online):2394-7454
Volume-11 Issue-113 April-2024
Full-Text PDF
Paper Title : Optimizing software fault prediction using decision tree regression and soft computing techniques
Author Name : Gurmeet Kaur, Jyoti Pruthi and Parul Gandhi
Abstract :

This research aims to develop a framework for software fault prediction (SFP) using machine learning techniques. A software fault may be the reason behind the failure of software functioning, and even a minor fault could cause the failure. Efficient SFP improves the overall quality and performance of the software products while streamlining the development process. The framework aims to reduce the cost and time involved in software development while optimizing the reliability of the software. It facilitates quick and efficient testing by identifying the modules that are likely to fail at the early stages of the project. Soft computing techniques provide an easy and effective solution for prediction problems. This study emphasizes the significance of soft computing approaches in SFP and highlights their role in improving computational efficiency, reducing development costs, and enhancing the reliability of software applications. Soft computing-based technique was proposed to address the prediction challenges. A metric suite was suggested, which includes a requirement-based metric and an adoption metric, designed by integrating process metrics of software development phases for fault prediction. It also designs decision tree regression (DTR)-based SFP model that uses these metrics as input and delivers predicted faults as output. The literature review reveals that only a few existing frameworks meet the requirement of implementing SFP models using a broad range of soft computing approaches for the same dataset. The suggested metric suite is validated by computing performance measures such as the area under curve (AUC), F-measure, precision, recall, and accuracy. The high-performance values of the suggested metric suite demonstrate its efficient fault prediction capability. The study also compares the performance of the suggested model with other adaptive neuro fuzzy inference systems (ANFIS), fuzzy-inference systems, and Bayesian-net-based SFP models, measured by root mean square error (RMSE), normalized root mean square error (NRMSE), the mean magnitude of relative error (MMRE), the balanced mean magnitude of relative error (BMMRE), and R-Squared. The suggested model outperforms others, achieving RMSE, MMRE, and R-Squared values of 3.54, 2.04 e-05, and 99.78, respectively. This study presents a highly efficient DTR based SFP model with more fault prediction accuracy than the existing SFP models. Implementation of this model is to significantly reduce costs and improve the time and effort of software development, making it an invaluable tool for software engineers.

Keywords : Software fault prediction, Predicted-fault, Process metrics, Soft-computing, Decision tree regression, Machine learning.
Cite this article : Kaur G, Pruthi J, Gandhi P. Optimizing software fault prediction using decision tree regression and soft computing techniques. International Journal of Advanced Technology and Engineering Exploration. 2024; 11(113):604-623. DOI:10.19101/IJATEE.2023.10101890.
References :
[1]Rathore SS, Kumar S. Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowledge-Based Systems. 2017; 119:232-56.
[Crossref] [Google Scholar]
[2]Sandhu PS, Khullar S, Singh S, Bains SK, Kaur M, Singh G. A study on early prediction of fault proneness in software modules using genetic algorithm. International Journal of Computer and Information Engineering. 2010; 4(12):1891-6.
[Google Scholar]
[3]Kaur R, Sharma ES. Various techniques to detect and predict faults in software system: survey. International Journal on Future Revolution in Computer Science & Communication Engineering. 2018; 4(2):330-6.
[Google Scholar]
[4]Chidamber SR, Kemerer CF. A metrics suite for object oriented design. IEEE Transactions on Software Engineering. 1994; 20(6):476-93.
[Crossref] [Google Scholar]
[5]Liu J, Lei J, Liao Z, He J. Software defect prediction model based on improved twin support vector machines. Soft Computing. 2023; 27(21):16101-10.
[Crossref] [Google Scholar]
[6]Azzeh M, Alqasrawi Y, Elsheikh Y. A soft computing approach for software defect density prediction. Journal of Software: Evolution and Process. 2023; 36(4).
[Crossref] [Google Scholar]
[7]Batool I, Khan TA. Software fault prediction using deep learning techniques. Software Quality Journal. 2023; 31(4):1241-80.
[Crossref] [Google Scholar]
[8]Borandag E. Software fault prediction using an RNN-based deep learning approach and ensemble machine learning techniques. Applied Sciences. 2023; 13(3):1-21.
[Crossref] [Google Scholar]
[9]Thirumoorthy K. A feature selection model for software defect prediction using binary Rao optimization algorithm. Applied Soft Computing. 2022; 131:109737.
[Crossref] [Google Scholar]
[10]Goyal S. Software fault prediction using evolving populations with mathematical diversification. Soft Computing. 2022; 26(24):13999-4020.
[Crossref] [Google Scholar]
[11]Daoud MS, Aftab S, Ahmad M, Khan MA, Iqbal A, Abbas S, et al. Machine learning empowered software defect prediction system. Intelligent Automation & Soft Computing. 2022; 31(2): 1287:1300.
[Crossref] [Google Scholar]
[12]Farid AB, Fathy EM, Eldin AS, Abd-elmegid LA. Software defect prediction using hybrid model (CBIL) of convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM). Peer J Computer Science. 2021; 7:1-22.
[Crossref] [Google Scholar]
[13]Zain ZM, Sakri S, Asmak INH, Parizi RM. Software defect prediction harnessing on multi 1-dimensional convolutional neural network structure. Computers, Materials & Continua. 2022; 71(1):1521-46.
[Crossref] [Google Scholar]
[14]Hassouneh Y, Turabieh H, Thaher T, Tumar I, Chantar H, Too J. Boosted whale optimization algorithm with natural selection operators for software fault prediction. IEEE Access. 2021; 9:14239-58.
[Crossref] [Google Scholar]
[15]Sharma P, Sangal AL. Building and testing a fuzzy linguistic assessment framework for defect prediction in ASD environment using process-based software metrics. Arabian Journal for Science and Engineering. 2020; 45(12):10327-51.
[Crossref] [Google Scholar]
[16]Tumar I, Hassouneh Y, Turabieh H, Thaher T. Enhanced binary moth flame optimization as a feature selection algorithm to predict software fault prediction. IEEE Access. 2020; 8:8041-55.
[Crossref] [Google Scholar]
[17]Juneja K. A fuzzy-filtered neuro-fuzzy framework for software fault prediction for inter-version and inter-project evaluation. Applied Soft Computing. 2019; 77:696-713.
[Crossref] [Google Scholar]
[18]Turabieh H, Mafarja M, Li X. Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Expert Systems with Applications. 2019; 122:27-42.
[Crossref] [Google Scholar]
[19]Bilgaiyan S, Mishra S, Das M. Effort estimation in agile software development using experimental validation of neural network models. International Journal of Information Technology. 2019; 11(3):569-73.
[Crossref] [Google Scholar]
[20]Chatterjee S, Maji B. A bayesian belief network based model for predicting software faults in early phase of software development process. Applied Intelligence. 2018; 48(8):2214-28.
[Crossref] [Google Scholar]
[21]Kalaivani N, Beena R. Overview of software defect prediction using machine learning algorithms. International Journal of Pure and Applied Mathematics. 2018; 118(20):3863-73.
[Google Scholar]
[22]Arshad A, Riaz S, Jiao L, Murthy A. Semi-supervised deep fuzzy c-mean clustering for software fault prediction. IEEE Access. 2018; 6:25675-85.
[Crossref] [Google Scholar]
[23]Geng W. RETRACTED: cognitive deep neural networks prediction method for software fault tendency module based on bound particle swarm optimization. Cognitive Systems Research. 2018; 5(c):1-12.
[Crossref] [Google Scholar]
[24]Singh P. Comprehensive model for software fault prediction. In international conference on inventive computing and informatics 2017 (pp. 1103-8). IEEE.
[Crossref] [Google Scholar]
[25]Dhanajayan RC, Pillai SA. SLMBC: spiral life cycle model-based bayesian classification technique for efficient software fault prediction and classification. Soft Computing. 2017; 21(2):403-15.
[Crossref] [Google Scholar]
[26]Chatterjee S, Maji B. A new fuzzy rule based algorithm for estimating software faults in early phase of development. Soft Computing. 2016; 20:4023-35.
[Crossref] [Google Scholar]
[27]Yadav HB, Yadav DK. A fuzzy logic based approach for phase-wise software defects prediction using software metrics. Information and Software Technology. 2015; 63:44-57.
[Crossref] [Google Scholar]
[28]He P, Li B, Liu X, Chen J, Ma Y. An empirical study on software defect prediction with a simplified metric set. Information and Software Technology. 2015; 59:170-90.
[Crossref] [Google Scholar]
[29]Monden A, Hayashi T, Shinoda S, Shirai K, Yoshida J, Barker M, et al. Assessing the cost effectiveness of fault prediction in acceptance testing. IEEE Transactions on Software Engineering. 2013; 39(10):1345-57.
[Crossref] [Google Scholar]
[30]Pandey AK, Goyal NK, Pandey AK, Goyal NK. Multistage model for residual fault prediction. Early Software Reliability Prediction: a Fuzzy Logic Approach. 2013:59-80.
[Crossref] [Google Scholar]
[31]Hall T, Beecham S, Bowes D, Gray D, Counsell S. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering. 2011; 38(6):1276-304.
[Crossref] [Google Scholar]
[32]Jin C, Jin SW, Ye JM. Artificial neural network-based metric selection for software fault-prone prediction model. IET Software. 2012; 6(6):479-87.
[Crossref] [Google Scholar]
[33]Bishnu PS, Bhattacherjee V. Software fault prediction using quad tree-based K-means clustering algorithm. IEEE Transactions on Knowledge and Data Engineering. 2011; 24(6):1146-50.
[Crossref] [Google Scholar]
[34]Arisholm E, Briand LC, Johannessen EB. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software. 2010; 83(1):2-17.
[Crossref] [Google Scholar]
[35]Catal C, Diri B. Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Information Sciences. 2009; 179(8):1040-58.
[Crossref] [Google Scholar]
[36]Turhan B, Bener A. Analysis of naive bayes assumptions on software fault data: an empirical study. Data & Knowledge Engineering. 2009; 68(2):278-90.
[Crossref] [Google Scholar]
[37]Fenton N, Neil M, Marsh W, Hearty P, Radliński Ł, Krause P. On the effectiveness of early life cycle defect prediction with bayesian nets. Empirical Software Engineering. 2008; 13:499-537.
[Crossref] [Google Scholar]
[38]Khoshgoftaar TM, Seliya N. Software quality classification modeling using the SPRINT decision tree algorithm. International Journal on Artificial Intelligence Tools. 2003; 12(3):207-25.
[Crossref] [Google Scholar]
[39]Koru AG, Liu H. An investigation of the effect of module size on defect prediction using static measures. In proceedings of the 2005 workshop on predictor models in software engineering 2005 (pp. 1-5). ACM.
[Crossref] [Google Scholar]
[40]Wang Q, Yu B, Zhu J. Extract rules from software quality prediction model based on neural network. In 16th international conference on tools with artificial intelligence 2004 (pp. 191-5). IEEE.
[Crossref] [Google Scholar]
[41]Briand LC, Wüst J, Ikonomovski SV, Lounis H. Investigating quality factors in object-oriented designs: an industrial case study. In proceedings of the 21st international conference on software engineering 1999 (pp. 345-54).
[Google Scholar]
[42]Kaur G, Pruthi J. A study of agile-based approaches to improve software quality. International Journal of Computer and Systems Engineering. 2022; 16(5):158-63.
[Google Scholar]
[43]Kaur G, Pruthi J, Gandhi P. Machine learning based software fault prediction models. Karbala International Journal of Modern Science. 2023; 9(2):9.
[Google Scholar]
[44]Kaur G, Pruthi J, Gandhi P. Decision tree regression analysis of proposed metric suite for software fault prediction. SN Computer Science. 2023; 5(1):69.
[Crossref] [Google Scholar]
[45]Keele S. Guidelines for performing systematic literature reviews in software engineering. EBSE Technical Report. 2007.
[Google Scholar]
[46]Lessmann S, Baesens B, Mues C, Pietsch S. Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Transactions on Software Engineering. 2008; 34(4):485-96.
[Crossref] [Google Scholar]
[47]Myrtveit I, Stensrud E, Shepperd M. Reliability and validity in comparative studies of software prediction models. IEEE Transactions on Software Engineering. 2005; 31(5):380-91.
[Crossref] [Google Scholar]
[48]Raeder T, Hoens TR, Chawla NV. Consequences of variability in classifier performance estimates. In international conference on data mining 2010 (pp. 421-30). IEEE.
[Crossref] [Google Scholar]
[49]Song Q, Jia Z, Shepperd M, Ying S, Liu J. A general software defect-proneness prediction framework. IEEE Transactions on Software Engineering. 2010; 37(3):356-70.
[Crossref] [Google Scholar]
[50]Ince DC, Hatton L, Graham-cumming J. The case for open computer programs. Nature. 2012; 482(7386):485-8.
[Crossref] [Google Scholar]
[51]http://promise.site.uottawa.ca/SERepository/datasets-page.html. Accessed 26 March 2024.
[52]Wang S, Yao X. Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability. 2013; 62(2):434-43.
[Crossref] [Google Scholar]
[53]Xu M, Watanachaturaporn P, Varshney PK, Arora MK. Decision tree regression for soft classification of remote sensing data. Remote Sensing of Environment. 2005; 97(3):322-36.
[Crossref] [Google Scholar]
[54]Baştanlar Y, Özuysal M. Introduction to machine learning. miRNomics: MicroRNA Biology and Computational Analysis. 2014; 105-28.
[Crossref] [Google Scholar]
[55]Sarkar D, Bali R, Sharma T. Practical machine learning with python. Book Practical Machine Learning with Python. 2018; 25-30.
[Crossref] [Google Scholar]
[56]Manias DM, Jammal M, Hawilo H, Shami A, Heidari P, Larabi A, et al. Machine learning for performance-aware virtual network function placement. In global communications conference 2019 (pp. 1-6). IEEE.
[Crossref] [Google Scholar]
[57]Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. The Journal of Machine Learning Research. 2011; 12:2825-30.
[Google Scholar]