Application of the forward selection strategy to the C4.5 Algorithm to improve the classification's accuracy of a breast cancer data set

Main Article Content

Robbi Rahim


Forward Selection, Data Mining, Classification, Method C4.5, Breast Cancer


The purpose of this study is to improve the classification accuracy of the C4.5 Algorithm utilizing the forward selection technique. Breast Cancer from the UCI Machine Learning Repository is the dataset utilized. There are 286 records in the dataset with 9 attributes and 1 class (label). The suggested model was evaluated with two existing classification models (C4.5 and Naive Bayes) using the RapidMiner program. The procedure consists of multiple stages, the first of which consists of selecting the dominant trait using the feature selection technique (weight by information gain). The second step is forward selection based on the outcome of feature selection. Before processing, the dataset is separated into training and testing halves. Where the ratios of comparison are 70:30, 80:20, and 90:10 The final step is examining the output. The experimental results demonstrate that the forward selection methodology employing the C4.5 (C4.5+FS) method outperforms the C4.5 and Nave Bayes classification techniques. C4.5+FS (Split Data 70:30) has an accuracy value of 76.74 percent, C4.5+FS (Split Data 80:20) has an accuracy value of 78.95 percent, C4.5+FS (Split Data 90:10) has an accuracy value of 78.57 percent, C4.5 (Split Data 70:30) has an accuracy value of 65.12 percent, and Nave Bayes (Split Data is 70:30) has an accuracy value In comparison to typical classification algorithms (C4.5 and Naive Bayes), the average accuracy values increased by 12.97 percent and 8.32 percent, respectively. In terms of precision, recal, and f-measure, the forward selection strategy utilizing the C4.5 method beat all other classification techniques, achieving 79.84 percent, 92.50 percent, and 85.55 percent, respectively. In addition, the results demonstrated an increase in the average AUC from 0.628% to 0.732%. Therefore, it can be inferred that the forward selection strategy can be applied to the Breast Cancer Data Set in order to increase the accuracy value of classification method C4.5.

Abstract 202 | PDF Downloads 336 XML Downloads 53 HTML Downloads 17


[1] Tarawneh, O.; Otair, M.; Husni, M.; Abuaddous, H. Y.; Tarawneh, M.; Almomani, M. A. (2022). Breast Cancer Classification using Decision Tree Algorithms, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 4, 676–680
[2] Oktavianto, H.; Handri, R. P. (2020). Analisis Klasifikasi Kanker Payudara Menggunakan Algoritma Naive Bayes, INFORMAL: Informatics Journal, Vol. 4, No. 3, 117. doi:10.19184/isj.v4i3.14170
[3] Aslam, M. A.; Cui, D. (2020). Breast Cancer Classification using Deep Convolutional Neural Network, Journal of Physics: Conference Series, Vol. 1584, No. 1. doi:10.1088/1742-6596/1584/1/012005
[4] Mathew, T. E.; Anil Kumar, K. S. (2020). A logistic regression based hybrid model for breast cancer classification, Indian Journal of Computer Science and Engineering, Vol. 11, No. 6, 899–906. doi:10.21817/indjcse/2020/v11i6/201106201
[5] He, P.; Zhang, B.; Shen, S. (2022). Effects of Out-of-Hospital Continuous Nursing on Postoperative Breast Cancer Patients by Medical Big Data, Journal of Healthcare Engineering, Vol. 2022. doi:10.1155/2022/9506915
[6] Guha Roy, D.; Singh, T. N. (2020). Predicting deformational properties of Indian coal: Soft computing and regression analysis approach, Measurement: Journal of the International Measurement Confederation, Vol. 149, 106975. doi:10.1016/j.measurement.2019.106975
[7] Charitopoulos, A.; Rangoussi, M.; Koulouriotis, D. (2020). On the Use of Soft Computing Methods in Educational Data Mining and Learning Analytics Research: a Review of Years 2010–2018, International Journal of Artificial Intelligence in Education, Vol. 30, No. 3, 371–430. doi:10.1007/s40593-020-00200-8
[8] Novita, R.; Zakir, S.; Nur Khomarudin, A.; Maiyana, E.; Hasyim, H. (2021). Use of the C4.5 Algorithm in Determining Scholarship Recipients, Journal of Physics: Conference Series, Vol. 1779, No. 1. doi:10.1088/1742-6596/1779/1/012009
[9] Windarto, A. P.; Herawan, T. (2022). K-Means Algorithm with Rapidminer in Clustering School Participation Rate in Indonesia, Lecture Notes in Electrical Engineering, Springer Nature Singapore Pte Ltd. 2022. doi:978-981-33-4597-3
[10] Sudarwanto, A. S.; Pujiyono. (2020). Responsibilities of banks to loss of customers using mobile banking, International Journal of Advanced Science and Technology, Vol. 29, No. 4, 1702–1706
[11] Sonavane, R.; Sonar, P. (2017). Classification and segmentation of brain tumor using Adaboost classifier, Proceedings - International Conference on Global Trends in Signal Processing, Information Computing and Communication, ICGTSPICC 2016, 396–403. doi:10.1109/ICGTSPICC.2016.7955334
[12] Othman, N. A.; Foozy, C. F. M.; Mustapha, A.; Mostafa, S. A.; Palaniappan, S.; Kashinath, S. A. (2021). A data mining approach for classification of traffic violations types, International Journal of Advances in Intelligent Informatics, Vol. 7, No. 3, 282–291. doi:10.26555/ijain.v7i3.708
[13] Bardab, S. N.; Ahmed, T. M.; Mohammed, T. A. A. (2021). Data mining classification algorithms: An overview, International Journal of Advanced and Applied Sciences, Vol. 8, No. 2, 1–5. doi:10.21833/ijaas.2021.02.001
[14] Al-Hawari, A.; Najadat, H.; Shatnawi, R. (2021). Classification of application reviews into software maintenance tasks using data mining techniques, Software Quality Journal, Vol. 29, No. 3, 667–703. doi:10.1007/s11219-020-09529-8
[15] Bahmani, E.; Jamshidi, M.; Shaltooki, A. A. (2019). Breast cancer prediction using a hybrid data mining model, International Journal on Informatics Visualization, Vol. 3, No. 4, 327–331. doi:10.30630/joiv.3.4.240
[16] Wu, J.; Hicks, C. (2021). Breast cancer type classification using machine learning, Journal of Personalized Medicine, Vol. 11, No. 2, 1–12. doi:10.3390/jpm11020061
[17] Raiesdana, S. (2021). Breast cancer detection using optimization-based feature pruning and classification algorithms, Middle East Journal of Cancer, Vol. 12, No. 1, 48–68. doi:10.30476/mejc.2020.85601.1294
[18] Samuri, S. M.; Nova, T. V.; Bahbibirahmatullah; Li, W. S.; Al-Qaysi, Z. T. (2022). Classification Model for Breast Cancer Mammograms, IIUM Engineering Journal, Vol. 23, No. 1, 187–199. doi:10.31436/IIUMEJ.V23I1.1825
[19] Fernanda, J. W. (2012). Boosting Neural Network dan Boosting Cart, Vol. 2, No. 2, 33–49