Explainable LightGBM Approach for Predicting Myocardial Infarction Mortality
Ana Letícia Garcez Vicente, Roseval Donisete Malaquias Junior, Roseli A. F. Romero
TL;DR
This work addresses predicting mortality in myocardial infarction patients using tabular hospital data with missing values. It evaluates ensemble boosted-tree methods (XGBoost, LightGBM, CatBoost) under different preprocessing regimes and leverages Tree SHAP for model interpretability. A key finding is that LightGBM without preprocessing achieves top performance (F1 ≈ 0.912, accuracy ≈ 0.918) and that preprocessing may not always improve results, as shown by an ablation study. The results suggest a simpler, more interpretable pipeline could robustly support clinical decisions such as hospital admission guidance for AMI patients.
Abstract
Myocardial Infarction is a main cause of mortality globally, and accurate risk prediction is crucial for improving patient outcomes. Machine Learning techniques have shown promise in identifying high-risk patients and predicting outcomes. However, patient data often contain vast amounts of information and missing values, posing challenges for feature selection and imputation methods. In this article, we investigate the impact of the data preprocessing task and compare three ensembles boosted tree methods to predict the risk of mortality in patients with myocardial infarction. Further, we use the Tree Shapley Additive Explanations method to identify relationships among all the features for the performed predictions, leveraging the entirety of the available data in the analysis. Notably, our approach achieved a superior performance when compared to other existing machine learning approaches, with an F1-score of 91,2% and an accuracy of 91,8% for LightGBM without data preprocessing.
