Table of Contents
Fetching ...

Deep Survival Analysis of Longitudinal EHR Data for Joint Prediction of Hospitalization and Death in COPD Patients

Enrico Manzini, Thomas Gonzalez Saito, Joan Escudero, Ana Génova, Cristina Caso, Tomas Perez-Porcuna, Alexandre Perera-Lluna

TL;DR

This study targets time-to-event prediction in COPD by leveraging longitudinal EHR data to jointly forecast hospitalization and death under semi-competing risks. It compares statistical, ML, and DL approaches, with DL models that incorporate longitudinal sequences (DDH and DRSM) delivering the strongest predictive performance, particularly for hospitalization. By estimating cause-specific hazards $\alpha_{ij}(t)$ and utilizing dynamic AUC metrics, the work demonstrates the value of temporal modeling for risk stratification in COPD. While internal validation shows clear gains for recurrent DL architectures, external validation and transformer-based EHR pretraining remain as future directions to enhance generalizability and clinical impact.

Abstract

Patients with chronic obstructive pulmonary disease (COPD) have an increased risk of hospitalizations, strongly associated with decreased survival, yet predicting the timing of these events remains challenging and has received limited attention in the literature. In this study, we performed survival analysis to predict hospitalization and death in COPD patients using longitudinal electronic health records (EHRs), comparing statistical models, machine learning (ML), and deep learning (DL) approaches. We analyzed data from more than 150k patients from the SIDIAP database in Catalonia, Spain, from 2013 to 2017, modeling hospitalization as a first event and death as a semi-competing terminal event. Multiple models were evaluated, including Cox proportional hazards, SurvivalBoost, DeepPseudo, SurvTRACE, Dynamic Deep-Hit, and Deep Recurrent Survival Machine. Results showed that DL models utilizing recurrent architectures outperformed both ML and linear approaches in concordance and time-dependent AUC, especially for hospitalization, which proved to be the harder event to predict. This study is, to our knowledge, the first to apply deep survival analysis on longitudinal EHR data to jointly predict multiple time-to-event outcomes in COPD patients, highlighting the potential of DL approaches to capture temporal patterns and improve risk stratification.

Deep Survival Analysis of Longitudinal EHR Data for Joint Prediction of Hospitalization and Death in COPD Patients

TL;DR

This study targets time-to-event prediction in COPD by leveraging longitudinal EHR data to jointly forecast hospitalization and death under semi-competing risks. It compares statistical, ML, and DL approaches, with DL models that incorporate longitudinal sequences (DDH and DRSM) delivering the strongest predictive performance, particularly for hospitalization. By estimating cause-specific hazards and utilizing dynamic AUC metrics, the work demonstrates the value of temporal modeling for risk stratification in COPD. While internal validation shows clear gains for recurrent DL architectures, external validation and transformer-based EHR pretraining remain as future directions to enhance generalizability and clinical impact.

Abstract

Patients with chronic obstructive pulmonary disease (COPD) have an increased risk of hospitalizations, strongly associated with decreased survival, yet predicting the timing of these events remains challenging and has received limited attention in the literature. In this study, we performed survival analysis to predict hospitalization and death in COPD patients using longitudinal electronic health records (EHRs), comparing statistical models, machine learning (ML), and deep learning (DL) approaches. We analyzed data from more than 150k patients from the SIDIAP database in Catalonia, Spain, from 2013 to 2017, modeling hospitalization as a first event and death as a semi-competing terminal event. Multiple models were evaluated, including Cox proportional hazards, SurvivalBoost, DeepPseudo, SurvTRACE, Dynamic Deep-Hit, and Deep Recurrent Survival Machine. Results showed that DL models utilizing recurrent architectures outperformed both ML and linear approaches in concordance and time-dependent AUC, especially for hospitalization, which proved to be the harder event to predict. This study is, to our knowledge, the first to apply deep survival analysis on longitudinal EHR data to jointly predict multiple time-to-event outcomes in COPD patients, highlighting the potential of DL approaches to capture temporal patterns and improve risk stratification.

Paper Structure

This paper contains 20 sections, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Schematic representation of survival data from longitudinal EHRs (left) and the transition model for semi-competing events (right). The models aim to predict the cause-specific hazard functions $\alpha_{ij}(t)$ from the input longitudinal EHRs. Patients can suffer one of the two events (Subject 1 and 4) or be right censored (subject 2). Moreover, subjects can experience event 2 after event 1 (Subject 3). In this case, the first event is considered in the input sequence when predicting the second event.
  • Figure 2: Aalen–Johansen estimator of the cumulative incidence functions for the two events, stratified by sex.
  • Figure 3: Sum of the absolute values of the SHAP estimation coefficients over time steps for the DDH model, weighted by the time attention of the model.
  • Figure 4: Hazard ratios of the cause specific COX-PH regressor.