Table of Contents
Fetching ...

Genetics-Driven Personalized Disease Progression Model

Haoyu Yang, Sanjoy Dey, Pablo Meyer

TL;DR

The paper tackles heterogeneity in chronic disease progression by introducing PerDPM, a genetics-driven personalized disease progression model. It jointly learns genetic groupings from GWAS data via a variational autoencoder and disease-state trajectories via a genetics-conditioned state-space model, enabling patient-specific progression patterns. The model integrates two modules—genetic makeups inference and genetics-driven state transitions—and optimizes an ELBO that couples VAE reconstruction with an RNN-based state dynamics framework. It demonstrates improved fit and state recovery on both synthetic data and a large real-world CKD cohort from UK Biobank, highlighting the value of coupling genomic information with longitudinal clinical data for precision medicine.

Abstract

Modeling disease progression through multiple stages is critical for clinical decision-making for chronic diseases, e.g., cancer, diabetes, chronic kidney diseases, and so on. Existing approaches often model the disease progression as a uniform trajectory pattern at the population level. However, chronic diseases are highly heterogeneous and often have multiple progression patterns depending on a patient's individual genetics and environmental effects due to lifestyles. We propose a personalized disease progression model to jointly learn the heterogeneous progression patterns and groups of genetic profiles. In particular, an end-to-end pipeline is designed to simultaneously infer the characteristics of patients from genetic markers using a variational autoencoder and how it drives the disease progressions using an RNN-based state-space model based on clinical observations. Our proposed model shows improvement on real-world and synthetic clinical data.

Genetics-Driven Personalized Disease Progression Model

TL;DR

The paper tackles heterogeneity in chronic disease progression by introducing PerDPM, a genetics-driven personalized disease progression model. It jointly learns genetic groupings from GWAS data via a variational autoencoder and disease-state trajectories via a genetics-conditioned state-space model, enabling patient-specific progression patterns. The model integrates two modules—genetic makeups inference and genetics-driven state transitions—and optimizes an ELBO that couples VAE reconstruction with an RNN-based state dynamics framework. It demonstrates improved fit and state recovery on both synthetic data and a large real-world CKD cohort from UK Biobank, highlighting the value of coupling genomic information with longitudinal clinical data for precision medicine.

Abstract

Modeling disease progression through multiple stages is critical for clinical decision-making for chronic diseases, e.g., cancer, diabetes, chronic kidney diseases, and so on. Existing approaches often model the disease progression as a uniform trajectory pattern at the population level. However, chronic diseases are highly heterogeneous and often have multiple progression patterns depending on a patient's individual genetics and environmental effects due to lifestyles. We propose a personalized disease progression model to jointly learn the heterogeneous progression patterns and groups of genetic profiles. In particular, an end-to-end pipeline is designed to simultaneously infer the characteristics of patients from genetic markers using a variational autoencoder and how it drives the disease progressions using an RNN-based state-space model based on clinical observations. Our proposed model shows improvement on real-world and synthetic clinical data.

Paper Structure

This paper contains 23 sections, 23 equations, 6 figures, 4 tables, 3 algorithms.

Figures (6)

  • Figure 1: State-Space Model (SSM). Observations $X_t$ are generated from latent variables $Z_t$. Lines denote the generative process. Different from RNNs structure, the latent representation $\mathbf Z_t$ in SSM is not deterministic.
  • Figure 2: Architecture of the proposed model. The model has two key components: genetic makeups inference and genetics-driven state transition. The other components, e.g., inference network and emission model, are omitted in this figure.
  • Figure 3: Analysis on synthetic datasets: The plot shows the mean values of clinical observations within different sample groups. The sample groups are discovered using the cluster variable $\mathbf V$, which is inferred by our proposed model. The x-axis represents the time steps, and the y-axis represents the clinical observation values. The plot demonstrates the latent variable $\mathbf V$ in our proposed model has the capacity to discriminate between patient groups with different disease progression types.
  • Figure 4: Analysis on Real-World Datasets: The plot illustrates the predicted CKD states (lines) versus the true CKD states (dots) for one patient along the timeline. Each line represents the probability of the predicted state, with the corresponding values displayed on the left-hand side of the Y-axis. Each dot represents the true CKD states graded by eGFR score, with the discretized state number positioned on the left-hand side of the Y-axis.
  • Figure 5: Analysis on Real-World Datasets: The plot displays the aggregated predicted CKD states (lines) versus the true CKD states (dots) for one patient along the timeline. The orange line represents the risk of having CKD, while the blue line represents the risk of not having CKD. The blue dots correspond to the true CKD stages, with a higher stage number indicating a more severe CKD disease.
  • ...and 1 more figures