Improving Cardiovascular Disease Prediction through Stratified Machine Learning Models and Combined Datasets
DOI:
https://doi.org/10.21928/uhdjst.v9n1y2025.pp149-168Keywords:
Cardiovascular Disease, Gradient Boosting, Heart Disease, K-Nearest Neighbors, Logistic Regression, Naive Bayes, Support Vector MachineAbstract
The global rise in cardiovascular disease (CVD) cases underscores the critical need for accurate and early diagnostic solutions. This study introduces a robust machine learning (ML) framework for predicting CVD risk by integrating two large, feature-identical datasets containing clinical and biological indicators along with patient history. Seven classification algorithms – logistic regression, random forest (RF), support vector machine (SVM), Gaussian naive Bayes (GNB), gradient boosting (GB), K-nearest neighbors, and decision tree (DT) – were employed. A stratified sampling strategy was used to ensure balanced class distribution, and model performance was further validated using k-fold cross-validation to enhance robustness and generalizability. The datasets, sourced from the UCI repository, were pre-processed and evaluated using metrics such as accuracy, precision, F1-score, log loss, and error rate, with performance further assessed using confusion matrices. Results revealed that ensemble models, particularly RF and DT, achieved optimal performance with 100% accuracy, while stratification significantly improved the outcomes of SVM, GNB, and GB. The integration of datasets, stratified sampling, and k-fold validation effectively enhanced model reliability while minimizing overfitting. These findings highlight the potential of ML to support early CVD diagnosis and lay the groundwork for future research on hybrid models and real-world clinical applications.
References
X. Han. “Heart Disease Type Prediction Model Based on SVM-ANN”. In: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering, pp. 422-426, 2022.
A. R. Snigdha, S. N. Tasnim, K. R. Miah and T. Islam. “Early Prediction of Heart Attack using Machine Learning Algorithms”. In: Proceedings of the 2nd International Conference on Computing Advancements, pp. 344-348, 2022.
A. Lahsasna, R. N. Ainon, R. Zainuddin and A. Bulgiba. “Design of a fuzzy-based decision support system for coronary heart disease diagnosis”. Journal of Medical Systems, vol. 36, pp. 3293-3306, 2012.
S. Song, T. Chen and G. Antoniou. “ANFIS Models for Heart Disease Prediction”. In: Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence”, pp. 32-35, 2021.
T. Suresh, T. A. Assegie, S. Rajkumar and N. Komal Kumar. “A hybrid approach to medical decision-making: Diagnosis of heart disease with machine-learning model”. International Journal of Electrical and Computer Engineering (IJECE), vol. 12, no. 2, p. 1831, 2022.
A. A. Hussein. “Improve the performance of K-means by using genetic algorithm for classification heart attack”. International Journal of Electrical and Computer Engineering (IJECE), vol. 8, no. 2, p. 1256, 2018.
K. Wang, J. Tian, C. Zheng, H. Yang, J. Ren, Y. Liu and Q. Han, Y. Zhang. “Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP”. Computers in Biology and Medicine, vol. 137, p. 104813, 2021.
S. Geetha, C. P. Devi, V. Kalaivani, C. J. Haritha and G. Preetha. “Prediction techniques of heart disease and diabetes disease using machine learning”. Turkish Journal of Computer and Mathematics Education, vol. 12, no. 10, pp. 3316-3325, 2021.
D. O. Hasan and A. M. Aladdin. “Sleep-related consequences of the COVID-19 pandemic: A survey study on insomnia and sleep apnea among affected individuals”. Insights in Public Health Journal, vol. 5, no 2, 2024.
R. K. Muhammed, R. R. Aziz, A. A. Hassan, A. M. Aladdin, S. J. Saydahet and T. A. Rashidal. “Comparative analysis of AES, blowfish, twofish, salsa 20, and ChaCha20 for image encryption”. Kurdistan Journal of Applied Research, vol. 9, no. 1, pp. 52-65, 2024.
Z. Rayan, M. Alfonse and A. B. M. Salem. “Machine learning approaches in smart health”. Procedia Computer Science, vol. 154, pp. 361-368, 2019.
A. M. Aladdin and T. A. Rashid. “Leo: Lagrange Elementary Optimization”. Germany, Springer, 2024.
A. M. Aladdin and T. A. Rashid. “A new lagrangian problem crossover-a systematic review and meta-analysis of crossover standards”. Systems, vol. 11, no. 3, p. 144, 2023.
R. Mohammed, N. K. Al-Salihi, T. A. Rashid, A. M. Aladdin, M. Mohammadi and J. Majidpour. “Artificial Cardiac Conduction System: Simulating Heart Function for Advanced Computational Problem Solving”. [Preprint], 2024.
A. Budianto, R. Ariyuana, and D. Maryono, “Perbandingan K-Nearest Neighbor (Knn) Dan Support Vector Machine (Svm) Dalam Pengenalan Karakter Plat Kendaraan Bermotor,” Jurnal Universitas Sebelas Maret, vol. 11, no. 1, p. 27, Nov. 2019, doi: 10.20961/jiptek.v11i1.18018.
A. Gavhane, G. Kokkula, I. Pandya and K. Devadkar. “Prediction of Heart Disease using Machine Learning”. In: 2018nd International Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE, 2018, pp. 1275-1278.
S. Ambekar and R. Phalnikar. “Disease Risk Prediction by Using Convolutional Neural Network”. In: 2018 4th International Conference on Computing Communication Control and Automation (ICCUBEA), IEEE, 2018, pp. 1-5.
N. Jothi, W. Husain, N. A. Rashid and S. Syed-Mohamad. “Feature selection method using genetic algorithm for medical dataset”. International Journal on Advanced Science Engineering Information Technology, vol. 9, no. 6, pp. 1907-1912, 2019.
T. A. Assegie. “A support vector machine based heart disease prediction”. Journal of Software Engineering and Intelligent Systems, vol. 4, pp. 111-116, 2019.
E. S. Kajal and M. Nishika. “Prediction of heart disease using data mining techniques”. International Journal of Advance Research, Ideas and Innovations in Technology, vol. 2, no. 3, pp. 1-7, 2016.
S. Babu, E. M. Vivek, K. P. Famina, K. Fida, P. Aswathi, M. Shanid and M. Hena. “Heart Disease Diagnosis using Data Mining Technique”. In: 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA). IEEE, 2017, pp. 750-753.
R. Kannan and V. Vasanthi. “Machine Learning Algorithms with ROC Curve for Predicting and Diagnosing the Heart Disease”. In: N. B. Muppalaneni, M. Ma and S. Gurumoorthy, Eds. Soft Computing and Medical Bioinformatics, Springer, Singapore, 2019, pp. 63-72.
K. Raza. “Improving the Prediction Accuracy of Heart Disease with Ensemble Learning and Majority Voting Rule”. In: U-Healthcare Monitoring Systems. Academic Press, United States, 2019, pp. 179-196.
L. Sapra, J. K. Sandhu and N. Goyal. “Intelligent method for detection of coronary artery disease with ensemble approach”. In: Advances in Communication and Computational Technology: Select Proceedings of ICACCT 2019. Springer, 2021, pp. 1033-1042.
A. Al Ahdal, M. Rakhra, R. R. Rajendran, F. Arslan, M. A. Khder, B. Patel and B. R. Rajagopal, R. Jain. “Monitoring cardiovascular problems in heart patients using machine learning”. Journal of Healthcare Engineering, vol. 2023, no. 1, p. 9738123, 2023.
S. Patidar, A. Jain and A. Gupta. “Comparative Analysis of Machine Learning Algorithms for Heart Disease Predictions”. In: 2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS), 2022, pp. 1340-1344. doi: 10.1109/ ICICCS53718.2022.9788408
N. S. Noori, B. H. Hameed, and M. Kh. Mohammed, “An economic evaluation of the performance efficiency of conservation agriculture and food security projects using logistic regression in iraq for the 2022-2023 season,” anbar journal of agricultural sciences, vol. 22, no. 2, pp. 1033–1049, Dec. 2024, doi: 10.32649/ajas.2024.184466.
Y. Chen, L. Li, W. Li, Q. Guo, Z. Du and Z. Xu. “Fundamentals of neural networks”. AI Computing Systems. Elsevier, Netherlands, pp. 17-51, 2024.
M. Schonlau and R. Y. Zou. “The random forest algorithm for statistical learning”. The Stata Journal: Promoting Communications on Statistics and Stata, vol. 20, no. 1, pp. 3-29, 2020.
A. U. Haq, J. P. Li, M. H. Memon, S. Nazir and R. Sun. “A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms”. Mobile Information Systems, vol. 2018, no. 1, p. 3860146, 2018.
S. Naiem, A. E. Khedr, A. M. Idrees and M. I. Marie. “Enhancing the efficiency of gaussian naïve bayes machine learning classifier in the detection of DDOS in cloud computing”. IEEE Access, vol. 11, pp. 124597-124608, 2023.
M. Malohlava and A. Candel. “Gradient Boosting Machine with H2O”. H20 Booklet, 2016. Available from: https://docs.h2o.ai/h2o/ latest-stable/h2o-docs/booklets [Last accessed on 2025 Apr 04].
I. Maryani, Rousyati, Indriyanti, D. Pratmanto, Y. M. Kristania and M. Maulidah. “Prediction of Heart Disease using Decision Tree in Comparison with Particle Swarm Optimization to Improve Accuracy”. In: Proceedings of the 3rd International Conference on Advanced Information Scientific Development, SCITEPRESS - Science and Technology Publications, 2023, pp. 233-239.
S. Patidar, D. Kumar and D. Rukwal. Comparative Analysis of Machine Learning Algorithms for Heart Disease Prediction”. In: ITM Web of Conferences, 2022. doi: 10.3233/ATDE220723
A. M. Aladdin and A. M. Abdulla. “Fitness-Dependent Optimizer for IoT Healthcare Using Adapted Parameters: A Case Study Implementation”. In: Practical Artificial Intelligence for Internet of Medical Things, CRC Press, United States, 2023, pp. 45-61.
J. M. Abdullah and T. Ahmed. “Fitness Dependent Optimizer: Inspired by the Bee Swarming Reproductive Process”. Vol. 7. IEEE Access, Park Avenue, pp. 43473-43486, 2019.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Tara Yousif Mawlood, Alla Ahmad Hassan, Rebwar Khalid Muhammed, Aso M. Aladdin, Tarik A. Rashid, Bryar A. Hassan

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.