Table of Contents
Fetching ...

Classification and Prediction of Heart Diseases using Machine Learning Algorithms

Akua Sekyiwaa Osei-Nkwantabisa, Redeemer Ntumy

TL;DR

This study addresses heart disease prediction by comparing four machine learning classifiers—Logistic Regression, K-Nearest Neighbors, Support Vector Machine, and Artificial Neural Networks—on the UCI Heart Disease dataset with an 80/20 train/test split and GridSearchCV hyperparameter tuning. Post-tuning results show K-Nearest Neighbors achieving the highest accuracy (~0.87), followed closely by Logistic Regression (~0.86), while SVM (~0.81) and ANN (~0.74) lag behind, highlighting the impact of tuning and dataset imbalance. The analysis demonstrates that accessible, relatively simple models can attain strong predictive performance on standard heart-disease datasets and provides practical guidance on handling skewed features, data imbalance, and feature selection for improved robustness. Overall, the work informs clinical screening by identifying effective, low-cost ML classifiers and outlining strategies to enhance prediction accuracy in imbalanced medical datasets.

Abstract

Heart disease is a serious worldwide health issue because it claims the lives of many people who might have been treated if the disease had been identified earlier. The leading cause of death in the world is cardiovascular disease, usually referred to as heart disease. Creating reliable, effective, and precise predictions for these diseases is one of the biggest issues facing the medical world today. Although there are tools for predicting heart diseases, they are either expensive or challenging to apply for determining a patient's risk. The best classifier for foretelling and spotting heart disease was the aim of this research. This experiment examined a range of machine learning approaches, including Logistic Regression, K-Nearest Neighbor, Support Vector Machine, and Artificial Neural Networks, to determine which machine learning algorithm was most effective at predicting heart diseases. One of the most often utilized data sets for this purpose, the UCI heart disease repository provided the data set for this study. The K-Nearest Neighbor technique was shown to be the most effective machine learning algorithm for determining whether a patient has heart disease. It will be beneficial to conduct further studies on the application of additional machine learning algorithms for heart disease prediction.

Classification and Prediction of Heart Diseases using Machine Learning Algorithms

TL;DR

This study addresses heart disease prediction by comparing four machine learning classifiers—Logistic Regression, K-Nearest Neighbors, Support Vector Machine, and Artificial Neural Networks—on the UCI Heart Disease dataset with an 80/20 train/test split and GridSearchCV hyperparameter tuning. Post-tuning results show K-Nearest Neighbors achieving the highest accuracy (~0.87), followed closely by Logistic Regression (~0.86), while SVM (~0.81) and ANN (~0.74) lag behind, highlighting the impact of tuning and dataset imbalance. The analysis demonstrates that accessible, relatively simple models can attain strong predictive performance on standard heart-disease datasets and provides practical guidance on handling skewed features, data imbalance, and feature selection for improved robustness. Overall, the work informs clinical screening by identifying effective, low-cost ML classifiers and outlining strategies to enhance prediction accuracy in imbalanced medical datasets.

Abstract

Heart disease is a serious worldwide health issue because it claims the lives of many people who might have been treated if the disease had been identified earlier. The leading cause of death in the world is cardiovascular disease, usually referred to as heart disease. Creating reliable, effective, and precise predictions for these diseases is one of the biggest issues facing the medical world today. Although there are tools for predicting heart diseases, they are either expensive or challenging to apply for determining a patient's risk. The best classifier for foretelling and spotting heart disease was the aim of this research. This experiment examined a range of machine learning approaches, including Logistic Regression, K-Nearest Neighbor, Support Vector Machine, and Artificial Neural Networks, to determine which machine learning algorithm was most effective at predicting heart diseases. One of the most often utilized data sets for this purpose, the UCI heart disease repository provided the data set for this study. The K-Nearest Neighbor technique was shown to be the most effective machine learning algorithm for determining whether a patient has heart disease. It will be beneficial to conduct further studies on the application of additional machine learning algorithms for heart disease prediction.
Paper Structure (11 sections, 8 figures, 2 tables)

This paper contains 11 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Subplots of the Qualitative Varibles
  • Figure 2: Subplots of the Quantitative Variables
  • Figure 3: Correlation Matrix for Quantitative Variables
  • Figure 4: Heart Disease Frequency for Sex
  • Figure 5: Performance of Machine Learning Algorithms Before Hyper-Parameter Tuning
  • ...and 3 more figures