Deep Survival Analysis of Longitudinal EHR Data for Joint Prediction of Hospitalization and Death in COPD Patients
Enrico Manzini, Thomas Gonzalez Saito, Joan Escudero, Ana Génova, Cristina Caso, Tomas Perez-Porcuna, Alexandre Perera-Lluna
TL;DR
This study targets time-to-event prediction in COPD by leveraging longitudinal EHR data to jointly forecast hospitalization and death under semi-competing risks. It compares statistical, ML, and DL approaches, with DL models that incorporate longitudinal sequences (DDH and DRSM) delivering the strongest predictive performance, particularly for hospitalization. By estimating cause-specific hazards $\alpha_{ij}(t)$ and utilizing dynamic AUC metrics, the work demonstrates the value of temporal modeling for risk stratification in COPD. While internal validation shows clear gains for recurrent DL architectures, external validation and transformer-based EHR pretraining remain as future directions to enhance generalizability and clinical impact.
Abstract
Patients with chronic obstructive pulmonary disease (COPD) have an increased risk of hospitalizations, strongly associated with decreased survival, yet predicting the timing of these events remains challenging and has received limited attention in the literature. In this study, we performed survival analysis to predict hospitalization and death in COPD patients using longitudinal electronic health records (EHRs), comparing statistical models, machine learning (ML), and deep learning (DL) approaches. We analyzed data from more than 150k patients from the SIDIAP database in Catalonia, Spain, from 2013 to 2017, modeling hospitalization as a first event and death as a semi-competing terminal event. Multiple models were evaluated, including Cox proportional hazards, SurvivalBoost, DeepPseudo, SurvTRACE, Dynamic Deep-Hit, and Deep Recurrent Survival Machine. Results showed that DL models utilizing recurrent architectures outperformed both ML and linear approaches in concordance and time-dependent AUC, especially for hospitalization, which proved to be the harder event to predict. This study is, to our knowledge, the first to apply deep survival analysis on longitudinal EHR data to jointly predict multiple time-to-event outcomes in COPD patients, highlighting the potential of DL approaches to capture temporal patterns and improve risk stratification.
