Table of Contents
Fetching ...

SurvBench: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis

Munib Mesinovic, Tingting Zhu

TL;DR

SurvBench addresses a critical preprocessing gap in survival analysis with electronic health records by delivering a raw-to-tensor, configuration-driven preprocessing pipeline that supports multi-modal data from MIMIC-IV, eICU, and MC-MED. It standardises data handling across time-series, static features, ICD codes, and radiology embeddings, while enforcing patient-level data splitting and explicit missingness masks to reduce leakage and improve model interpretability. The pipeline includes horizon truncation, discrete-time binning, scalable time-series aggregation, and multi-modal integration, with outputs tuned for compatibility with pycox and similar survival modelling tools. This work enables fair, reproducible benchmarking of survival methods and accelerates methodological innovation by removing ad hoc data engineering as a confounding factor, thereby facilitating robust cross-dataset generalisation for critical care survival analysis.

Abstract

Electronic health record (EHR) data present tremendous opportunities for advancing survival analysis through deep learning, yet reproducibility remains severely constrained by inconsistent preprocessing methodologies. We present SurvBench, a comprehensive, open-source preprocessing pipeline that transforms raw PhysioNet datasets into standardised, model-ready tensors for multi-modal survival analysis. SurvBench provides data loaders for three major critical care databases, MIMIC-IV, eICU, and MC-MED, supporting diverse modalities including time-series vitals, static demographics, ICD diagnosis codes, and radiology reports. The pipeline implements rigorous data quality controls, patient-level splitting to prevent data leakage, explicit missingness tracking, and standardised temporal aggregation. SurvBench handles both single-risk (e.g., in-hospital mortality) and competing-risks scenarios (e.g., multiple discharge outcomes). The outputs are compatible with pycox library packages and implementations of standard statistical and deep learning models. By providing reproducible, configuration-driven preprocessing with comprehensive documentation, SurvBench addresses the "preprocessing gap" that has hindered fair comparison of deep learning survival models, enabling researchers to focus on methodological innovation rather than data engineering.

SurvBench: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis

TL;DR

SurvBench addresses a critical preprocessing gap in survival analysis with electronic health records by delivering a raw-to-tensor, configuration-driven preprocessing pipeline that supports multi-modal data from MIMIC-IV, eICU, and MC-MED. It standardises data handling across time-series, static features, ICD codes, and radiology embeddings, while enforcing patient-level data splitting and explicit missingness masks to reduce leakage and improve model interpretability. The pipeline includes horizon truncation, discrete-time binning, scalable time-series aggregation, and multi-modal integration, with outputs tuned for compatibility with pycox and similar survival modelling tools. This work enables fair, reproducible benchmarking of survival methods and accelerates methodological innovation by removing ad hoc data engineering as a confounding factor, thereby facilitating robust cross-dataset generalisation for critical care survival analysis.

Abstract

Electronic health record (EHR) data present tremendous opportunities for advancing survival analysis through deep learning, yet reproducibility remains severely constrained by inconsistent preprocessing methodologies. We present SurvBench, a comprehensive, open-source preprocessing pipeline that transforms raw PhysioNet datasets into standardised, model-ready tensors for multi-modal survival analysis. SurvBench provides data loaders for three major critical care databases, MIMIC-IV, eICU, and MC-MED, supporting diverse modalities including time-series vitals, static demographics, ICD diagnosis codes, and radiology reports. The pipeline implements rigorous data quality controls, patient-level splitting to prevent data leakage, explicit missingness tracking, and standardised temporal aggregation. SurvBench handles both single-risk (e.g., in-hospital mortality) and competing-risks scenarios (e.g., multiple discharge outcomes). The outputs are compatible with pycox library packages and implementations of standard statistical and deep learning models. By providing reproducible, configuration-driven preprocessing with comprehensive documentation, SurvBench addresses the "preprocessing gap" that has hindered fair comparison of deep learning survival models, enabling researchers to focus on methodological innovation rather than data engineering.

Paper Structure

This paper contains 52 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Publication trends of deep learning survival analysis methods (2020--2024). The histogram shows the number of published deep learning methods for survival analysis by year, highlighting notable architectural innovations including parametric models (DeepWeiSurv, 2020), competing risks frameworks (DeepCompete, 2021), transformer-based approaches (TransformerJM, 2022), proportional hazards in neural models (CoxNAM, 2023), and complex inference models (DySurv/SurvMamba, 2024). This proliferation of methods without corresponding standardisation of preprocessing and benchmarking infrastructure motivates the development of SurvBench. Data compiled from major machine learning and medical informatics venues, including NeurIPS, ICML, ICLR, AAAI, CHIL, and domain-specific journals.
  • Figure 2: Overview of the SurvBench preprocessing and modelling pipeline. Multimodal source data, including static features (e.g., demographics, comorbidities via ICD codes), time-series (e.g., vital signs, lab results), and radiology (e.g., free-text reports), are extracted from the eICU, MIMIC-IV, and MC-MED databases. The pipeline generates labels for both single-risk (e.g., mortality) and competing-risks (e.g., ED disposition) survival tasks. A critical preprocessing step involves patient-level data splitting to prevent data leakage from repeated hospital encounters. Processed data for each patient is structured into aligned tensors with corresponding binary missingness masks. These tensors serve as inputs to deep learning models, such as a multi-head architecture designed to estimate cause-specific cumulative incidence functions for competing risks.
  • Figure 3: Kaplan-Meier survival curve for eICU cohort. The blue line represents the survival probability estimate with 95% confidence intervals (shaded region) over the 240-hour prediction horizon. The curve demonstrates a monotonic decrease with approximately 26% cumulative event rate, consistent with typical ICU mortality patterns. Hours since admission represents time from ICU entry to either in-hospital mortality (event) or discharge alive/horizon truncation (censoring).
  • Figure 4: Distribution of event and censoring durations. Stacked histogram showing time-to-event (orange) and time-to-censoring (blue) distributions across the 240-hour horizon. The prominent spike at 240 hours in the censored category is a result of the horizon truncation. The event distribution exhibits characteristic right-skew with higher density in the acute phase (0--100 hours), while the overall 10% event rate reflects typical ICU mortality prevalence.
  • Figure 5: Temporal trajectories of dynamic features. Mean values (lines) and standard deviations (shaded bands) for ten randomly selected features across six 4-hour windows, computed from observed values only (using missingness masks). Features are z-score standardised, demonstrating successful normalisation with comparable scales. Trajectories exhibit clinically plausible patterns: laboratory values (PT-INR, total protein) show gradual changes, while vital signs (paO2, MCV) display relative stability. The declining paO2 trajectory may indicate progressive respiratory deterioration characteristic of critically ill populations.