SeqRisk: Transformer-augmented latent variable model for robust survival prediction with longitudinal data
Mine Öğretir, Miika Koskinen, Juha Sinisalo, Risto Renkonen, Harri Lähdesmäki
TL;DR
SeqRisk targets robust survival prediction from irregular, high-dimensional longitudinal data by learning latent trajectories with a VAE or LVAE, then aggregating them via a transformer before predicting risk with a nonlinear Cox head. The model is trained end-to-end with a joint objective that blends generative reconstruction via the ELBO and survival discrimination via the partial likelihood, enabling latent representations that are both informative and hazard-relevant. Across Survival MNIST, PhysioNet, and CHD datasets, SeqRisk variants with transformer aggregation show strong discrimination and robustness to missingness, often outperforming classical baselines and competing deep models, while calibration remains competitive. This approach offers a principled, data-efficient framework for longitudinal survival analysis with real-world irregular data, supporting earlier risk stratification and potential clinical decision support.
Abstract
In healthcare, risk assessment of patient outcomes has been based on survival analysis for a long time, i.e. modeling time-to-event associations. However, conventional approaches rely on data from a single time-point, making them suboptimal for fully leveraging longitudinal patient history and capturing temporal regularities. Focusing on clinical real-world data and acknowledging its challenges, we utilize latent variable models to effectively handle irregular, noisy, and sparsely observed longitudinal data. We propose SeqRisk, a method that combines variational autoencoder (VAE) or longitudinal VAE (LVAE) with a transformer-based sequence aggregation and Cox proportional hazards module for risk prediction. SeqRisk captures long-range interactions, enhances predictive accuracy and generalizability, as well as provides partial explainability for sample population characteristics in attempts to identify high-risk patients. SeqRisk demonstrated robust performance under conditions of increasing sparsity, consistently surpassing existing approaches.
