Table of Contents
Fetching ...

Multimodal Deep Learning for Early Prediction of Patient Deterioration in the ICU: Integrating Time-Series EHR Data with Clinical Notes

Binesh Sadanandan

Abstract

Early identification of patients at risk for clinical deterioration in the intensive care unit (ICU) remains a critical challenge. Delayed recognition of impending adverse events, including mortality, vasopressor initiation, and mechanical ventilation, contributes to preventable morbidity and mortality. We present a multimodal deep learning approach that combines structured time-series data (vital signs and laboratory values) with unstructured clinical notes to predict patient deterioration within 24 hours. Using the MIMIC-IV database, we constructed a cohort of 74,822 ICU stays and generated 5.7 million hourly prediction samples. Our architecture employs a bidirectional LSTM encoder for temporal patterns in physiologic data and ClinicalBERT embeddings for clinical notes, fused through a cross-modal attention mechanism. We also present a systematic review of existing approaches to ICU deterioration prediction, identifying 31 studies published between 2015 and 2024. Most existing models rely solely on structured data and achieve area under the curve (AUC) values between 0.70 and 0.85. Studies incorporating clinical notes remain rare but show promise for capturing information not present in structured fields. Our multimodal model achieves a test AUROC of 0.7857 and AUPRC of 0.1908 on 823,641 held-out samples, with a validation-to-test gap of only 0.6 percentage points. Ablation analysis validates the multimodal approach: clinical notes improve AUROC by 2.5 percentage points and AUPRC by 39.2% relative to a structured-only baseline, while deep learning models consistently outperform classical baselines (XGBoost AUROC: 0.7486, logistic regression: 0.7171). This work contributes both a thorough review of the field and a reproducible multimodal framework for clinical deterioration prediction.

Multimodal Deep Learning for Early Prediction of Patient Deterioration in the ICU: Integrating Time-Series EHR Data with Clinical Notes

Abstract

Early identification of patients at risk for clinical deterioration in the intensive care unit (ICU) remains a critical challenge. Delayed recognition of impending adverse events, including mortality, vasopressor initiation, and mechanical ventilation, contributes to preventable morbidity and mortality. We present a multimodal deep learning approach that combines structured time-series data (vital signs and laboratory values) with unstructured clinical notes to predict patient deterioration within 24 hours. Using the MIMIC-IV database, we constructed a cohort of 74,822 ICU stays and generated 5.7 million hourly prediction samples. Our architecture employs a bidirectional LSTM encoder for temporal patterns in physiologic data and ClinicalBERT embeddings for clinical notes, fused through a cross-modal attention mechanism. We also present a systematic review of existing approaches to ICU deterioration prediction, identifying 31 studies published between 2015 and 2024. Most existing models rely solely on structured data and achieve area under the curve (AUC) values between 0.70 and 0.85. Studies incorporating clinical notes remain rare but show promise for capturing information not present in structured fields. Our multimodal model achieves a test AUROC of 0.7857 and AUPRC of 0.1908 on 823,641 held-out samples, with a validation-to-test gap of only 0.6 percentage points. Ablation analysis validates the multimodal approach: clinical notes improve AUROC by 2.5 percentage points and AUPRC by 39.2% relative to a structured-only baseline, while deep learning models consistently outperform classical baselines (XGBoost AUROC: 0.7486, logistic regression: 0.7171). This work contributes both a thorough review of the field and a reproducible multimodal framework for clinical deterioration prediction.
Paper Structure (53 sections, 8 equations, 6 figures, 11 tables)

This paper contains 53 sections, 8 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Architecture of the multimodal ICU deterioration prediction model. Vital signs and laboratory values are encoded by a bidirectional LSTM with temporal attention. Clinical notes are embedded using frozen ClinicalBERT weights and projected to a shared representation space. A gated cross-modal fusion mechanism dynamically weights contributions from both modalities. The fused representation passes through a classification head to predict deterioration probability within 24 hours. Tensor dimensions are shown in parentheses (B = batch size).
  • Figure 2: Receiver operating characteristic curves (left) and precision-recall curves (right) for all five models on the held-out test set (n = 823,641). The multimodal model achieves the highest AUROC (0.7857) and AUPRC (0.1908). The precision-recall curves are especially informative given the 2.8% positive rate; the dashed line marks the random baseline.
  • Figure 3: Distribution of predicted probabilities from the multimodal model on the test set. Most negative samples (blue) cluster at lower probabilities, while positive samples (red) show a wider spread with a right-shifted tail. The vertical dashed line marks the optimized decision threshold (0.47). The substantial overlap between distributions reflects the difficulty of this prediction task at 2.8% prevalence.
  • Figure 4: Calibration curves for the three deep learning models. Points show the observed positive fraction versus the mean predicted probability in each decile bin. The dashed diagonal indicates perfect calibration. All models tend to overestimate risk at lower predicted probabilities and underestimate it at higher values. The multimodal model achieves the best calibration (ECE = 0.213), though all models have room for improvement through post-hoc calibration methods.
  • Figure 5: Test set performance comparison across all five models. (a) AUROC and (b) AUPRC. The multimodal model achieves the highest scores on both metrics. Adding clinical notes to structured data improves AUROC by 2.5 percentage points and AUPRC by 39.2% relative. Deep learning models outperform classical baselines, and the text-only model performs near chance.
  • ...and 1 more figures