Table of Contents
Fetching ...

Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution

Alexander Bakumenko, Janine Hoelscher, Hudson Smith

TL;DR

This work introduces a transparent, lightweight multimodal ensemble for early ICU mortality prediction using the first 48 hours of physiological time-series and unstructured clinical notes. A Bidirectional LSTM handles vitals while a finetuned ClinicalModernBERT processes notes; their probability outputs are combined by a linear logistic regression meta-learner on standardized logits, yielding a calibrated risk score. The architecture provides multilevel interpretability, including per-case modality attribution and within-branch Integrated Gradients explanations, and demonstrates robustness to missing modalities via calibrated fallbacks. On MIMIC-III, the ensemble achieves AUPRC 0.565 and AUROC 0.891, outperforming single modalities, with well-calibrated predictions and actionable explanations suitable for clinical governance and deployment considerations.

Abstract

Early identification of intensive care patients at risk of in-hospital mortality enables timely intervention and efficient resource allocation. Despite high predictive performance, existing machine learning approaches lack transparency and robustness, limiting clinical adoption. We present a lightweight, transparent multimodal ensemble that fuses physiological time-series measurements with unstructured clinical notes from the first 48 hours of an ICU stay. A logistic regression model combines predictions from two modality-specific models: a bidirectional LSTM for vitals and a finetuned ClinicalModernBERT transformer for notes. This traceable architecture allows for multilevel interpretability: feature attributions within each modality and direct per-case modality attributions quantifying how vitals and notes influence each decision. On the MIMIC-III benchmark, our late-fusion ensemble improves discrimination over the best single model (AUPRC 0.565 vs. 0.526; AUROC 0.891 vs. 0.876) while maintaining well-calibrated predictions. The system remains robust through a calibrated fallback when a modality is missing. These results demonstrate competitive performance with reliable, auditable risk estimates and transparent, predictable operation, which together are crucial for clinical use.

Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution

TL;DR

This work introduces a transparent, lightweight multimodal ensemble for early ICU mortality prediction using the first 48 hours of physiological time-series and unstructured clinical notes. A Bidirectional LSTM handles vitals while a finetuned ClinicalModernBERT processes notes; their probability outputs are combined by a linear logistic regression meta-learner on standardized logits, yielding a calibrated risk score. The architecture provides multilevel interpretability, including per-case modality attribution and within-branch Integrated Gradients explanations, and demonstrates robustness to missing modalities via calibrated fallbacks. On MIMIC-III, the ensemble achieves AUPRC 0.565 and AUROC 0.891, outperforming single modalities, with well-calibrated predictions and actionable explanations suitable for clinical governance and deployment considerations.

Abstract

Early identification of intensive care patients at risk of in-hospital mortality enables timely intervention and efficient resource allocation. Despite high predictive performance, existing machine learning approaches lack transparency and robustness, limiting clinical adoption. We present a lightweight, transparent multimodal ensemble that fuses physiological time-series measurements with unstructured clinical notes from the first 48 hours of an ICU stay. A logistic regression model combines predictions from two modality-specific models: a bidirectional LSTM for vitals and a finetuned ClinicalModernBERT transformer for notes. This traceable architecture allows for multilevel interpretability: feature attributions within each modality and direct per-case modality attributions quantifying how vitals and notes influence each decision. On the MIMIC-III benchmark, our late-fusion ensemble improves discrimination over the best single model (AUPRC 0.565 vs. 0.526; AUROC 0.891 vs. 0.876) while maintaining well-calibrated predictions. The system remains robust through a calibrated fallback when a modality is missing. These results demonstrate competitive performance with reliable, auditable risk estimates and transparent, predictable operation, which together are crucial for clinical use.

Paper Structure

This paper contains 40 sections, 2 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Transparent multimodal ensemble for early ICU mortality prediction. Two specialist models, a bidirectional LSTM for time-series vitals (TS), and a finetuned ClinicalModernBERT transformer for clinical notes (CN), produce probability outputs that are converted to standardized logits. A logistic regression model (meta-learner) combines these logits to generate a calibrated in-hospital mortality (IHM) risk score. Orange blocks indicate modular, independently trainable components. The system provides multilevel explainable AI (XAI): Integrated Gradients attributions identify influential vitals variables/time-steps and note tokens within each specialist, while per-case modality shares quantify how vitals and notes influence a decision. When a modality is unavailable, the system falls back to the calibrated probability, ensuring graceful degradation.
  • Figure 2: Precision–Recall Curve for all specialist classifiers.
  • Figure 3: AUPRC discrimination (mean [95% CI]) comparison of six meta-learning algorithms for the LSTM + ft_ClinicalModernBERT pairing.
  • Figure 4: AUPRC discrimination (mean [95% CI]) for all specialist models and the selected ensemble on the test set. Models emb_ModernBERT, emb_ClinicalModernBERT, and ft_ClinicalModernBERT denoted as emb_MBERT, emb_CMBERT, and ft_CMBERT respectively.
  • Figure 5: Reliability diagrams before and after isotonic regression for (a) the LSTM vitals specialist, (b) the ft_ClinicalModernBERT notes specialist, and (c) the logistic stacker ensemble. Each panel shows pre-calibration (raw probabilities) and post-calibration (isotonic regression) curves; points closer to the diagonal indicate better calibration.
  • ...and 4 more figures