Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution
Alexander Bakumenko, Janine Hoelscher, Hudson Smith
TL;DR
This work introduces a transparent, lightweight multimodal ensemble for early ICU mortality prediction using the first 48 hours of physiological time-series and unstructured clinical notes. A Bidirectional LSTM handles vitals while a finetuned ClinicalModernBERT processes notes; their probability outputs are combined by a linear logistic regression meta-learner on standardized logits, yielding a calibrated risk score. The architecture provides multilevel interpretability, including per-case modality attribution and within-branch Integrated Gradients explanations, and demonstrates robustness to missing modalities via calibrated fallbacks. On MIMIC-III, the ensemble achieves AUPRC 0.565 and AUROC 0.891, outperforming single modalities, with well-calibrated predictions and actionable explanations suitable for clinical governance and deployment considerations.
Abstract
Early identification of intensive care patients at risk of in-hospital mortality enables timely intervention and efficient resource allocation. Despite high predictive performance, existing machine learning approaches lack transparency and robustness, limiting clinical adoption. We present a lightweight, transparent multimodal ensemble that fuses physiological time-series measurements with unstructured clinical notes from the first 48 hours of an ICU stay. A logistic regression model combines predictions from two modality-specific models: a bidirectional LSTM for vitals and a finetuned ClinicalModernBERT transformer for notes. This traceable architecture allows for multilevel interpretability: feature attributions within each modality and direct per-case modality attributions quantifying how vitals and notes influence each decision. On the MIMIC-III benchmark, our late-fusion ensemble improves discrimination over the best single model (AUPRC 0.565 vs. 0.526; AUROC 0.891 vs. 0.876) while maintaining well-calibrated predictions. The system remains robust through a calibrated fallback when a modality is missing. These results demonstrate competitive performance with reliable, auditable risk estimates and transparent, predictable operation, which together are crucial for clinical use.
