Table of Contents
Fetching ...

Multi-class heart disease Detection, Classification, and Prediction using Machine Learning Models

Mahfuzul Haque, Abu Saleh Musa Miah, Debashish Gupta, Md. Maruf Al Hossain Prince, Tanzina Alam, Nusrat Sharmin, Mohammed Sowket Ali, Jungpil Shin

TL;DR

The paper tackles the problem of accurate heart-disease detection and classification in real-world Bangladeshi populations by building three ethically sourced datasets (HDD, BIG-Dataset, CD Dataset) and evaluating Logistic Regression and Random Forest classifiers. It demonstrates that feature engineering on 19 symptoms and 4 risk factors, combined with an LR/RF pipeline and randomized hyperparameter search, yields high predictive performance, with the CD Dataset achieving the highest test accuracy of 96.66% via Random Forest. The key contributions are the introduction of real-world, diverse datasets and a comprehensive comparison across multiclass and binary tasks, establishing RF as a robust choice for complex, heterogeneous clinical data. The work has practical significance for scalable, real-time diagnostic support and personalized healthcare planning, with potential to improve clinical outcomes and reduce mortality through earlier, data-driven decision-making.

Abstract

Heart disease is a leading cause of premature death worldwide, particularly among middle-aged and older adults, with men experiencing a higher prevalence. According to the World Health Organization (WHO), non-communicable diseases, including heart disease, account for 25\% (17.9 million) of global deaths, with over 43,204 annual fatalities in Bangladesh. However, the development of heart disease detection (HDD) systems tailored to the Bangladeshi population remains underexplored due to the lack of benchmark datasets and reliance on manual or limited-data approaches. This study addresses these challenges by introducing new, ethically sourced HDD dataset, BIG-Dataset and CD dataset which incorporates comprehensive data on symptoms, examination techniques, and risk factors. Using advanced machine learning techniques, including Logistic Regression and Random Forest, we achieved a remarkable testing accuracy of up to 96.6\% with Random Forest. The proposed AI-driven system integrates these models and datasets to provide real-time, accurate diagnostics and personalized healthcare recommendations. By leveraging structured datasets and state-of-the-art machine learning algorithms, this research offers an innovative solution for scalable and effective heart disease detection, with the potential to reduce mortality rates and improve clinical outcomes.

Multi-class heart disease Detection, Classification, and Prediction using Machine Learning Models

TL;DR

The paper tackles the problem of accurate heart-disease detection and classification in real-world Bangladeshi populations by building three ethically sourced datasets (HDD, BIG-Dataset, CD Dataset) and evaluating Logistic Regression and Random Forest classifiers. It demonstrates that feature engineering on 19 symptoms and 4 risk factors, combined with an LR/RF pipeline and randomized hyperparameter search, yields high predictive performance, with the CD Dataset achieving the highest test accuracy of 96.66% via Random Forest. The key contributions are the introduction of real-world, diverse datasets and a comprehensive comparison across multiclass and binary tasks, establishing RF as a robust choice for complex, heterogeneous clinical data. The work has practical significance for scalable, real-time diagnostic support and personalized healthcare planning, with potential to improve clinical outcomes and reduce mortality through earlier, data-driven decision-making.

Abstract

Heart disease is a leading cause of premature death worldwide, particularly among middle-aged and older adults, with men experiencing a higher prevalence. According to the World Health Organization (WHO), non-communicable diseases, including heart disease, account for 25\% (17.9 million) of global deaths, with over 43,204 annual fatalities in Bangladesh. However, the development of heart disease detection (HDD) systems tailored to the Bangladeshi population remains underexplored due to the lack of benchmark datasets and reliance on manual or limited-data approaches. This study addresses these challenges by introducing new, ethically sourced HDD dataset, BIG-Dataset and CD dataset which incorporates comprehensive data on symptoms, examination techniques, and risk factors. Using advanced machine learning techniques, including Logistic Regression and Random Forest, we achieved a remarkable testing accuracy of up to 96.6\% with Random Forest. The proposed AI-driven system integrates these models and datasets to provide real-time, accurate diagnostics and personalized healthcare recommendations. By leveraging structured datasets and state-of-the-art machine learning algorithms, this research offers an innovative solution for scalable and effective heart disease detection, with the potential to reduce mortality rates and improve clinical outcomes.

Paper Structure

This paper contains 18 sections, 3 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Distribution of the ratio of the class labels for the BidD dataset.
  • Figure 2: Heatmap of the correlation matrix for the CD dataset features.
  • Figure 3: Workflow of the proposed model and evaluation pipeline.
  • Figure 4: Performance Analysis of Classification Model.