Modeling Complex Disease Trajectories using Deep Generative Models with Semi-Supervised Latent Processes
Cécile Trottet, Manuel Schürch, Ahmed Allam, Imon Barua, Liubov Petelytska, Oliver Distler, Anna-Maria Hoffmann-Vold, Michael Krauthammer, the EUSTAR collaborators
TL;DR
This work introduces a deep probabilistic temporal model that jointly models multivariate clinical measurements and sparse medical concept labels over time, using a learnable latent process $\boldsymbol{z}$ guided by semi-supervised medical concepts to achieve interpretable disentanglement. The model employs a factorized generative structure $p_{\psi}(\boldsymbol{y}, \boldsymbol{x}, \boldsymbol{z} \vert \boldsymbol{c})$, with a learnable prior $p_{\phi}(\boldsymbol{z} \vert \boldsymbol{c})$, a measurement likelihood $p_{\pi}(\boldsymbol{x} \vert \boldsymbol{z}, \boldsymbol{c})$, and a guidance network $p_{\gamma}(\boldsymbol{y} \vert \boldsymbol{z}, \boldsymbol{c})$, trained via a variational objective that balances unsupervised reconstruction and supervised guidance with KL regularization. Applied to EUSTAR data for systemic sclerosis, the approach learns organ-specific latent subspaces, enables online prediction with quantified uncertainty, and supports cohort analyses, clustering, and patient similarity assessments in the latent space. The results demonstrate accurate forecasting, reliable uncertainty quantification, and interpretable latent representations that align with clinically meaningful concepts, while revealing potential disease subtypes and trajectories. The framework is extensible to continuous-time priors, conditional trajectory generation under interventions, and inclusion of additional organs and concepts, offering a scalable tool for hypothesis generation and clinical decision support in complex diseases.
Abstract
In this paper, we propose a deep generative time series approach using latent temporal processes for modeling and holistically analyzing complex disease trajectories. We aim to find meaningful temporal latent representations of an underlying generative process that explain the observed disease trajectories in an interpretable and comprehensive way. To enhance the interpretability of these latent temporal processes, we develop a semi-supervised approach for disentangling the latent space using established medical concepts. By combining the generative approach with medical knowledge, we leverage the ability to discover novel aspects of the disease while integrating medical concepts into the model. We show that the learned temporal latent processes can be utilized for further data analysis and clinical hypothesis testing, including finding similar patients and clustering the disease into new sub-types. Moreover, our method enables personalized online monitoring and prediction of multivariate time series including uncertainty quantification. We demonstrate the effectiveness of our approach in modeling systemic sclerosis, showcasing the potential of our machine learning model to capture complex disease trajectories and acquire new medical knowledge.
