Table of Contents
Fetching ...

Explainable Machine Learning for ICU Readmission Prediction

Alex G. C. de Sá, Daniel Gould, Anna Fedyukova, Mitchell Nicholas, Lucy Dockrell, Calvin Fletcher, David Pilcher, Daniel Capurro, David B. Ascher, Khaled El-Khawas, Douglas E. V. Pires

TL;DR

This work addresses ICU readmission prediction by delivering a standardised, explainable ML pipeline trained on multicentric eICU data and externally validated on monocentric MIMIC IV data. Using a Random Forest with 47 balanced features, it attains internal AUC of $0.68$, with a blind-eICU test of $0.672$ and external MIMIC IV validation of $0.616$, demonstrating generalisability across ICU settings. SHAP explanations reveal that early nutritional markers (e.g., albumin) and late-stage renal and hematologic indicators (e.g., BUN, hemoglobin) drive readmission risk, supporting clinically interpretable risk assessment. Calibration and likelihood analyses identify practical thresholds for decision-making, enabling clinicians to flag high-risk patients post-ICU discharge and tailor follow-up, thereby potentially reducing unplanned readmissions and associated morbidity and costs.

Abstract

The intensive care unit (ICU) comprises a complex hospital environment, where decisions made by clinicians have a high level of risk for the patients' lives. A comprehensive care pathway must then be followed to reduce p complications. Uncertain, competing and unplanned aspects within this environment increase the difficulty in uniformly implementing the care pathway. Readmission contributes to this pathway's difficulty, occurring when patients are admitted again to the ICU in a short timeframe, resulting in high mortality rates and high resource utilisation. Several works have tried to predict readmission through patients' medical information. Although they have some level of success while predicting readmission, those works do not properly assess, characterise and understand readmission prediction. This work proposes a standardised and explainable machine learning pipeline to model patient readmission on a multicentric database (i.e., the eICU cohort with 166,355 patients, 200,859 admissions and 6,021 readmissions) while validating it on monocentric (i.e., the MIMIC IV cohort with 382,278 patients, 523,740 admissions and 5,984 readmissions) and multicentric settings. Our machine learning pipeline achieved predictive performance in terms of the area of the receiver operating characteristic curve (AUC) up to 0.7 with a Random Forest classification model, yielding an overall good calibration and consistency on validation sets. From explanations provided by the constructed models, we could also derive a set of insightful conclusions, primarily on variables related to vital signs and blood tests (e.g., albumin, blood urea nitrogen and hemoglobin levels), demographics (e.g., age, and admission height and weight), and ICU-associated variables (e.g., unit type). These insights provide an invaluable source of information during clinicians' decision-making while discharging ICU patients.

Explainable Machine Learning for ICU Readmission Prediction

TL;DR

This work addresses ICU readmission prediction by delivering a standardised, explainable ML pipeline trained on multicentric eICU data and externally validated on monocentric MIMIC IV data. Using a Random Forest with 47 balanced features, it attains internal AUC of , with a blind-eICU test of and external MIMIC IV validation of , demonstrating generalisability across ICU settings. SHAP explanations reveal that early nutritional markers (e.g., albumin) and late-stage renal and hematologic indicators (e.g., BUN, hemoglobin) drive readmission risk, supporting clinically interpretable risk assessment. Calibration and likelihood analyses identify practical thresholds for decision-making, enabling clinicians to flag high-risk patients post-ICU discharge and tailor follow-up, thereby potentially reducing unplanned readmissions and associated morbidity and costs.

Abstract

The intensive care unit (ICU) comprises a complex hospital environment, where decisions made by clinicians have a high level of risk for the patients' lives. A comprehensive care pathway must then be followed to reduce p complications. Uncertain, competing and unplanned aspects within this environment increase the difficulty in uniformly implementing the care pathway. Readmission contributes to this pathway's difficulty, occurring when patients are admitted again to the ICU in a short timeframe, resulting in high mortality rates and high resource utilisation. Several works have tried to predict readmission through patients' medical information. Although they have some level of success while predicting readmission, those works do not properly assess, characterise and understand readmission prediction. This work proposes a standardised and explainable machine learning pipeline to model patient readmission on a multicentric database (i.e., the eICU cohort with 166,355 patients, 200,859 admissions and 6,021 readmissions) while validating it on monocentric (i.e., the MIMIC IV cohort with 382,278 patients, 523,740 admissions and 5,984 readmissions) and multicentric settings. Our machine learning pipeline achieved predictive performance in terms of the area of the receiver operating characteristic curve (AUC) up to 0.7 with a Random Forest classification model, yielding an overall good calibration and consistency on validation sets. From explanations provided by the constructed models, we could also derive a set of insightful conclusions, primarily on variables related to vital signs and blood tests (e.g., albumin, blood urea nitrogen and hemoglobin levels), demographics (e.g., age, and admission height and weight), and ICU-associated variables (e.g., unit type). These insights provide an invaluable source of information during clinicians' decision-making while discharging ICU patients.
Paper Structure (9 sections, 3 figures, 2 tables)

This paper contains 9 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The methodological workflow followed by this work. Data comes from monocentric and multicentric databases (i.e., MIMIC IV and eICU, respectively), where the characterisation of ICU patients, hospitals, and ICU(s) takes place. Preprocessing filters out, standardises and imputes data for the development of a 30-day readmission machine learning model, which is built and validated on eICU and tested on MIMIC-IV. This model is interpreted and also used to drive explanations from variables and predictions, potentially assisting and guiding clinicians while treating and discharging new patients from the ICU.
  • Figure 2: The predictive performance of the proposed readmission model on 10-fold cross-validation and blind test on eICU data. External validation was made utilising MIMIC IV data.
  • Figure 3: The SHAP summary plot for our proposed readmission model on eICU training data. We show the 20 features with higher SHAP values, i.e., that have a higher impact on the model’s predictive outputs.