Table of Contents
Fetching ...

Explainable LightGBM Approach for Predicting Myocardial Infarction Mortality

Ana Letícia Garcez Vicente, Roseval Donisete Malaquias Junior, Roseli A. F. Romero

TL;DR

This work addresses predicting mortality in myocardial infarction patients using tabular hospital data with missing values. It evaluates ensemble boosted-tree methods (XGBoost, LightGBM, CatBoost) under different preprocessing regimes and leverages Tree SHAP for model interpretability. A key finding is that LightGBM without preprocessing achieves top performance (F1 ≈ 0.912, accuracy ≈ 0.918) and that preprocessing may not always improve results, as shown by an ablation study. The results suggest a simpler, more interpretable pipeline could robustly support clinical decisions such as hospital admission guidance for AMI patients.

Abstract

Myocardial Infarction is a main cause of mortality globally, and accurate risk prediction is crucial for improving patient outcomes. Machine Learning techniques have shown promise in identifying high-risk patients and predicting outcomes. However, patient data often contain vast amounts of information and missing values, posing challenges for feature selection and imputation methods. In this article, we investigate the impact of the data preprocessing task and compare three ensembles boosted tree methods to predict the risk of mortality in patients with myocardial infarction. Further, we use the Tree Shapley Additive Explanations method to identify relationships among all the features for the performed predictions, leveraging the entirety of the available data in the analysis. Notably, our approach achieved a superior performance when compared to other existing machine learning approaches, with an F1-score of 91,2% and an accuracy of 91,8% for LightGBM without data preprocessing.

Explainable LightGBM Approach for Predicting Myocardial Infarction Mortality

TL;DR

This work addresses predicting mortality in myocardial infarction patients using tabular hospital data with missing values. It evaluates ensemble boosted-tree methods (XGBoost, LightGBM, CatBoost) under different preprocessing regimes and leverages Tree SHAP for model interpretability. A key finding is that LightGBM without preprocessing achieves top performance (F1 ≈ 0.912, accuracy ≈ 0.918) and that preprocessing may not always improve results, as shown by an ablation study. The results suggest a simpler, more interpretable pipeline could robustly support clinical decisions such as hospital admission guidance for AMI patients.

Abstract

Myocardial Infarction is a main cause of mortality globally, and accurate risk prediction is crucial for improving patient outcomes. Machine Learning techniques have shown promise in identifying high-risk patients and predicting outcomes. However, patient data often contain vast amounts of information and missing values, posing challenges for feature selection and imputation methods. In this article, we investigate the impact of the data preprocessing task and compare three ensembles boosted tree methods to predict the risk of mortality in patients with myocardial infarction. Further, we use the Tree Shapley Additive Explanations method to identify relationships among all the features for the performed predictions, leveraging the entirety of the available data in the analysis. Notably, our approach achieved a superior performance when compared to other existing machine learning approaches, with an F1-score of 91,2% and an accuracy of 91,8% for LightGBM without data preprocessing.
Paper Structure (12 sections, 1 figure, 2 tables)

This paper contains 12 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Shapley Values showing the influence exerted by the most influential features in predicting mortality