Classification and Prediction of Heart Diseases using Machine Learning Algorithms
Akua Sekyiwaa Osei-Nkwantabisa, Redeemer Ntumy
TL;DR
This study addresses heart disease prediction by comparing four machine learning classifiers—Logistic Regression, K-Nearest Neighbors, Support Vector Machine, and Artificial Neural Networks—on the UCI Heart Disease dataset with an 80/20 train/test split and GridSearchCV hyperparameter tuning. Post-tuning results show K-Nearest Neighbors achieving the highest accuracy (~0.87), followed closely by Logistic Regression (~0.86), while SVM (~0.81) and ANN (~0.74) lag behind, highlighting the impact of tuning and dataset imbalance. The analysis demonstrates that accessible, relatively simple models can attain strong predictive performance on standard heart-disease datasets and provides practical guidance on handling skewed features, data imbalance, and feature selection for improved robustness. Overall, the work informs clinical screening by identifying effective, low-cost ML classifiers and outlining strategies to enhance prediction accuracy in imbalanced medical datasets.
Abstract
Heart disease is a serious worldwide health issue because it claims the lives of many people who might have been treated if the disease had been identified earlier. The leading cause of death in the world is cardiovascular disease, usually referred to as heart disease. Creating reliable, effective, and precise predictions for these diseases is one of the biggest issues facing the medical world today. Although there are tools for predicting heart diseases, they are either expensive or challenging to apply for determining a patient's risk. The best classifier for foretelling and spotting heart disease was the aim of this research. This experiment examined a range of machine learning approaches, including Logistic Regression, K-Nearest Neighbor, Support Vector Machine, and Artificial Neural Networks, to determine which machine learning algorithm was most effective at predicting heart diseases. One of the most often utilized data sets for this purpose, the UCI heart disease repository provided the data set for this study. The K-Nearest Neighbor technique was shown to be the most effective machine learning algorithm for determining whether a patient has heart disease. It will be beneficial to conduct further studies on the application of additional machine learning algorithms for heart disease prediction.
