Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution

Alexander Bakumenko; Janine Hoelscher; Hudson Smith

Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution

Alexander Bakumenko, Janine Hoelscher, Hudson Smith

TL;DR

This work introduces a transparent, lightweight multimodal ensemble for early ICU mortality prediction using the first 48 hours of physiological time-series and unstructured clinical notes. A Bidirectional LSTM handles vitals while a finetuned ClinicalModernBERT processes notes; their probability outputs are combined by a linear logistic regression meta-learner on standardized logits, yielding a calibrated risk score. The architecture provides multilevel interpretability, including per-case modality attribution and within-branch Integrated Gradients explanations, and demonstrates robustness to missing modalities via calibrated fallbacks. On MIMIC-III, the ensemble achieves AUPRC 0.565 and AUROC 0.891, outperforming single modalities, with well-calibrated predictions and actionable explanations suitable for clinical governance and deployment considerations.

Abstract

Early identification of intensive care patients at risk of in-hospital mortality enables timely intervention and efficient resource allocation. Despite high predictive performance, existing machine learning approaches lack transparency and robustness, limiting clinical adoption. We present a lightweight, transparent multimodal ensemble that fuses physiological time-series measurements with unstructured clinical notes from the first 48 hours of an ICU stay. A logistic regression model combines predictions from two modality-specific models: a bidirectional LSTM for vitals and a finetuned ClinicalModernBERT transformer for notes. This traceable architecture allows for multilevel interpretability: feature attributions within each modality and direct per-case modality attributions quantifying how vitals and notes influence each decision. On the MIMIC-III benchmark, our late-fusion ensemble improves discrimination over the best single model (AUPRC 0.565 vs. 0.526; AUROC 0.891 vs. 0.876) while maintaining well-calibrated predictions. The system remains robust through a calibrated fallback when a modality is missing. These results demonstrate competitive performance with reliable, auditable risk estimates and transparent, predictable operation, which together are crucial for clinical use.

Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution

TL;DR

Abstract

Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)