Ensemble-Based Machine Learning for Early Detection and Risk Prediction of Cardiovascular Diseases
Main Article Content
Keywords
Heart disease prediction, Machine learning, Ensemble classifier, SHAP, Decision Tree, Naive Bayes, SVM, KNN, logistic regression, RF, Gradient Boosting, XGB
Abstract
Cardiac diseases are among the leading causes of death globally. Early and accurate detection can save lives and improve health outcomes. In this research, we used machine learning techniques to predict cardiac diseases. Four widely known datasets were combined to create a diverse dataset with 1,190 instances and 12 attributes. Nine machine learning models were applied, including Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors, Support Vector Machine, Naive Bayes, XGBoost, AdaBoost, and Gradient Boosting. Ensemble methods like Stacking, Bagging, Voting, Random Subspace, LightGBM, and CatBoost were also implemented to improve performance. The highest accuracy achieved was almost 90% using a ensemble framework. This study analyzes cardiac data based on essential features to provide optimum prediction results. The ensemble framework integrates multiple machine learning classification methods to achieve an optimal solution. The evaluation of the model was performed using precision, recall, and F1-score. In this research, SHAP (SHapley Additive exPlanations) was used to analyze the contribution of each feature towards the prediction of cardiac diseases. SHAP makes machine learning models more understandable by showing how individual features impact predictions. It helped identify important attributes like age, cholesterol levels, and exercise-induced angina that significantly influence the model's outcomes. SHAP summary plots were used to visualize feature importance and their interaction with predictions, improving the model's transparency. This understanding highlights the key factors contributing to heart disease risk. Using SHAP ensures the results are easy to interpret, making the predictions more reliable for medical decision-making. The proposed framework, which combines machine learning and ensemble techniques, outperforms individual models. It demonstrates improved prediction accuracy and effectiveness compared to existing approaches. This research introduces an innovative and practical solution for predicting heart disease in real-world clinical settings. It contributes to reducing the healthcare and societal burden caused by cardiovascular diseases. The study emphasizes the potential of advanced machine learning techniques in improving healthcare outcomes.
References
2. Abed-Alguni BH & Barhoush, M 2018, ‘Distributed grey wolf optimizer for numerical optimization problems’, Jordanian J. Comput. Inf. Technol. (JJCIT), vol. 4, no. 3, pp. 130-149.
3. Ahsan, M. M. and Z. Siddique, “Machine learning-based heart disease diagnosis: A systematic literature review,” Artificial Intelligence in Medicine, vol. 128, pp. 102289, 2022.
4. Al Reshan, M.S.; Amin, S.; Zeb, M.A.; Sulaiman, A.; Alshahrani, H.; Shaikh, A Networks. IEEE Access 2023, 11, 121574–121591. [CrossRef]
5. Alfaidi, A. R. Aljuhani, B. Alshehri, H. Alwadei, S. Sabbeh, Machine learning: assisted cardiovascular diseases diagnosis, Int. J. Adv. Comput. Sci. Appl. 13 (2022).
6. Ali, L. A. Rahman, A. Khan, M. Zhou, A. Javeed, J.A. Khan, An automated diagnostic system for heart disease prediction based on statistical model and optimally configured deep neural network, IEEE Access 7 (2019) 34938–34945.
7. Alliance c.h, IEEE standards association and continua health alliance join forces to develop end-to-end, plug-and-play connectivity for personal connected health, Continua Health Alliance, 2013. [Online]. Available: http://standards.ieee.org/news/2013/ieeesa continua.html
8. Alotaibi, F. S. “Implementation of Machine Learning Model to Predict Heart Failure Disease,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 10, no. 6, Art. no. 6, 29 2019, doi: 10.14569/IJACSA.2019.0100637.
9. Alsaeedi AH, Aljanabi, AH, Manna, ME & Albukhnefis, AL 2020, A proactive metaheuristic model for optimizing weights of artificial neural network, Indonesian Journal of Electrical Engineering and Computer Science, vol. 20, pp. 976-984.
10. Ambesange, S., Vijayalaxmi, A., Sridevi, S., Yashoda, B. S. (2020). Multiple heart diseases prediction using logistic regression with ensemble and hyper parameter tuning techniques. In 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), pp. 827-832. https://doi.org/10.1109/WorldS450073.2020.9210404
11. Ambrish G., Ganesh, B., Ganesh, A., Srinivas, C., Mensinkal, K. (2022). Logistic regression technique for prediction of cardiovascular disease. Global Transitions Proceedings, 3(1): 127-130. https://doi.org/10.1016/j.gltp.2022.04.008
12. Ananey, D., Obiri and E. Sarku, “Predicting the Presence of Heart Diseases using Comparative Data Mining and Machine Learning Algorithms,” IJCA, vol. 176, no. 11, pp. 17–21, Apr. 2020, doi: 10.5120/ijca2020920034.
13. Archana K, S Elangovan, Survey of classification techniques in data mining, International Journal of Computer Science and Mobile Applications, vol. 2, pp. 65-71, 2014.
14. Atallah,R. and A. Al-Mousa, Heart disease detection using machine learning majority voting ensemble method, in: Proceedings of the Second International Conference on New Trends in Computing Sciences (ICTCS), 2019, 1–6.
15. Bahrami B., and Mirsaeid Hosseini Shirvani, February 2015, “Prediction and Diagnosis of Heart Disease by Data Mining Techniques”, Journal of Multidisciplinary Engineering Science and Technology (JMEST), ISSN: 3159- 0040, Vol. 2, Issue 2, pp. 164-168.
16. Bashir, S., Khan, Z.S., Khan, F.H., Anjum, A., Bashir, K. (2019). Improving heart disease prediction using feature selection approaches. In 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST), pp. 619-623. https://doi.org/10.1109/IBCAST.2019.8667106
17. Bataineh,A. Al S. Manacek, MLP-PSO hybrid algorithm for heart disease prediction, J. Pers. Med. 12 (2022) 1208
18. Bertsimas, D. L. Mingardi and B. Stellato, “Machine learning for real-time heart disease prediction,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 9, pp. 3627–3637, 2021.
19. Bhargava N, G. Sharma, R. Bhargava, and M. Mathuria, Decision tree analysis on j48 algorithm for data mining, Proceedings of International Journal of Advanced Research in Computer Science and Software Engineering, vol. 3,no. 6, 2013.
20. Bharti, R., Khamparia, A., Shabaz, M., Dhiman, G., Pande, S., Singh, P. (2021). Prediction of heart disease using a combination of machine learning and deep learning. Computational Intelligence and Neuroscience, 2021: 8387680. https://doi.org/10.1155/2021/8387680
21. Boukhatem, C.; Youssef, H.Y.; Nassif, A.B. Heart disease prediction using machine learning. In Proceedings of the 2022 Advances in Science and Engineering Technology International Conferences (ASET), Dubai, United Arab Emirates, 21–24 February 2022
22. Cenitta, D. R.V. Arjunan, K. Prema, Ischemic heart disease multiple imputation technique using machine learning algorithm, Eng. Sci. 19 (2022) 262–272.
23. Chandrasekhar,N. S. Peddakrishna, Enhancing heart disease prediction accuracy through machine learning techniques and optimization, Processes 11 (2023) 1210.
24. Charles,V.B. D. Surendran, A. SureshKumar, Heart disease data based privacy preservation using enhanced elgamal and resnet classifier, Biomed. Signal Process. Control 71 (2022), 103185.
25. Chaurasia V, Early prediction of heart diseases using data mining techniques, 2017.
26. Chen Ly, N, C, Qiu, T & Sangaiah, AK 2018, Deep learning and superpixel feature extraction based on contractive autoencoder for change detection in SAR images, Transactions on Industrial Informatics, IEEE, vol. 14, pp. 5530-5538.
27. Chen, M., Y. Hao, K. Hwang, L. Wang, and L. Wang, “Disease Prediction by Machine Learning Over Big Data From Healthcare Communities,” IEEE Access, vol. 5, pp. 8869–8879, 2017, doi: 10.1109/ACCESS.2017.2694446.
28. Chhabbi A., Lakhan Ahuja, Sahil Ahir, and Y. K. Sharma,19 March 2016,“Heart Disease Prediction Using Data Mining Techniques”, International Journal of Research in Advent Technology,E-ISSN:23219637,Special Issue National Conference “NCPC-2016”, pp. 104-106.
29. Chowdary, K & Bhargav, P & Nikhil, N & Varun, K & Jayanthi, “Early heart disease prediction using ensemble learning techniques”. Journal of Physics: Conference Series. 2325. 012051. 10.1088/1742-6596/2325/1/012051., 2022.
30. Cinetha K, and Dr. P. Uma Maheswari, Mar.-Apr. 2014,“Decision Support System for Precluding Coronary Heart Disease using Fuzzy Logic.”, International Journal of Computer Science Trends and Technology (IJCST), Vol. 2, Issue 2, pp. 102-107.
31. Das H Naik, B & Behera, H 2020, ‘Medical disease analysis using neuro-fuzzy with feature extraction model for classification’, Informatics in Medicine Unlocked, vol. 8, P. 100288.
32. Deepika, D. N. Balaji, Effective heart disease prediction using novel mlp-ebmda approach, Biomed. Signal Process. Control 72 (2022), 103318.
33. Denison D.G , B. K. Mallick, and A. F. Smith, A bayesian cart algorithm, Biometrika, vol. 85, no. 2, pp. 363-377, 1998.
34. Dilli, M. Babu and M. Sambath, “Heart disease prognosis and quick access to medical data record using data lake with deep learning approaches,” International Journal of Intelligent Systems and Applications in Engineering, vol. 11, no. 3s, pp. 292–300, 2023.
35. Diwakar, M., Tripathi, A., Joshi, K., Memoria, M., Singh, P. (2021). Latest trends on heart disease prediction using machine learning and image fusion. Materials Today: Proceedings, 37: 3213-3218. https://doi.org/10.1016/j.matpr.2020.09.078
36. Dolatabadi Davari, A., Khadem, S. E., & Asl, B. M. (2017). Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM. Computer Methods and Programs in Biomedicine, 138, 117–126. https://doi.org/10.1016/j.cmpb.2016.10.011
37. Doppala, B.P.; Bhattacharyya, D.; Janarthanan, M.; Baik, N. A reliable machine intelligence model for accurate identification of cardiovascular diseases using ensemble techniques. J. Healthc. Eng. 2022, 2022, 2585235. [CrossRef] [PubMed]
38. Durairaj M, and V. Ranjani, Data mining applications in healthcare sector: a study, International journal of scientific & technology research, vol. 2, no. 10, pp. 29-35, 2013.
39. Dutta, A. T. Batabyal, M. Basu and S. T. Acton, “An efficient convolutional neural network for coronary heart disease prediction,” Expert Systems with Applications, vol. 159, pp. 113408, 2020.
40. Dutta, A. T. Batabyal, M. Basu, S.T. Acton, An efficient convolutional neural network for coronary heart disease prediction, Expert Syst. Appl. 159 (2020), 113408.
41. Fitriyani , NL, Syafrudin, M, Alfian, G & Rhee, J 2020, HDPM: An effective heart disease prediction model for a clinical decision support system, Access, IEEE, vol. 8, pp. 133034-133050.
42. Gaidhane PJ& Nigam, MJ 2018, A hybrid grey wolf optimizer and artificial bee colony algorithm for enhancing the performance of complex systems, Journal of Computational Science, vol. 27, pp. 284-302.
43. Golande, A. “Heart Disease Prediction Using Effective Machine Learning Techniques,” vol. 8, no. 1, 2019.
44. Govinda,P. moorthi, P. Ranjith Kumar, A likelihood swarm whale optimization based LeNet classifier approach for the prediction and diagnosis of patients with atherosclerosis disease, Comput. Methods Biomech. Biomed. Eng. 26 (2023) 326–337.
45. Gudadhe M., Kapil Wankhade, and Snehlata Dongre, Sept 2010, “Decision Support System for Heart Disease Based on Support Vector Machine and Artificial Neural Network”, International Conference on Computer and Communication Technology (ICCCT), DOI: 10.1109/ICCCT.2010.5640377, 17-19.
46. Haitao G, Z. Qingbao, and X. Shoujiang, Rapid-exploring random tree algorithm for path planning of robot based on grid method, Journal of Nanjing Normal University (Engineering and Technology Edition), vol. 2, no. 14, 2007.
47. Hao,Y. M. Usama, J. Yang, M.S. Hossain, A. Ghoneim, Recurrent convolutional neural network based multimodal disease risk prediction, Futur. Gener. Comput. Syst. 92 (2019) 76–83.
48. Hassani, M.A., Tao, R., Kamyab, M., Mohammadi, M.H. (2020). An approach of predicting heart disease using a hybrid neural network and decision tree. In Proceedings of the 5th International Conference on Big Data and Computing, pp. 84-89. https://doi.org/10.1145/3404687.3404704
49. Herath, H. M. K. K. M. B., Karunasena, G. M. K. B., Priyankara, H. D. N. S., & Madhusanka, B. G. D. A. (2021, June 22). High-performance Cardiovascular Medicine: Artificial Intelligence for Coronary Artery Disease. Research Square. https://doi.org/10.21203/rs.3.rs-642228/v1
50. J., S. K., & S., G. (2019). Prediction of heart disease using machine learning algorithms. 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT). https://doi.org/10.1109/iciict1.2019.8741465
51. Jain D & Singh, V 2018, ‘Feature selection and classification systems for chronic disease prediction: A review’, Egyptian Informatics Journal, vol. 19, pp. 179-189.
52. Jan M., Awan, A. A., Khalid, M. S., & Nisar, S. (2018). Ensemble approach for developing a smart heart disease prediction system using classification algorithms. Research Reports in Clinical Cardiology, Volume 9, 33–45. https://doi.org/10.2147/rrcc.s172035
53. Javeed A, Rizvi, SS, Zhou, S, Riaz, R, Khan, SU & Kwon, SJ 2020, Heart risk failure prediction using a novel feature selection method for feature refinement and neural network for classification, Mobile Information Systems, vol. 2020, pp. 1-11.
54. Jayaraman V & Sultana, HP 2019, ‘Artificial gravitational cuckoo search algorithm along with particle bee optimized associative memory neural network for feature selection in heart disease classification’, Journal of Ambient Intelligence and Humanized Computing, pp. 1-10.