Table of Contents
Fetching ...

Learning Disease Progression Models That Capture Health Disparities

Erica Chiang, Divya Shanmugam, Ashley N. Beecy, Gabriel Sayer, Deborah Estrin, Nikhil Garg, Emma Pierson

TL;DR

This work tackles bias in disease progression models caused by health disparities across initial severity, progression rate, and visit frequency. It introduces an interpretable Bayesian progression model with group-specific parameters and a Poisson visit process, proving identifiability of all parameters. Theoretical results show that ignoring disparities biases severity estimates, and synthetic experiments validate parameter recovery and bias findings. Application to NewYork-Presbyterian heart failure data reveals higher severity and distinct disparity patterns in non-white groups, with disparities accounting materially shift high-risk classifications. Overall, the study provides a disparity-aware framework for more accurate and equitable disease progression inference that can generalize to other chronic diseases.

Abstract

Disease progression models are widely used to inform the diagnosis and treatment of many progressive diseases. However, a significant limitation of existing models is that they do not account for health disparities that can bias the observed data. To address this, we develop an interpretable Bayesian disease progression model that captures three key health disparities: certain patient populations may (1) start receiving care only when their disease is more severe, (2) experience faster disease progression even while receiving care, or (3) receive follow-up care less frequently conditional on disease severity. We show theoretically and empirically that failing to account for any of these disparities can result in biased estimates of severity (e.g., underestimating severity for disadvantaged groups). On a dataset of heart failure patients, we show that our model can identify groups that face each type of health disparity, and that accounting for these disparities while inferring disease severity meaningfully shifts which patients are considered high-risk.

Learning Disease Progression Models That Capture Health Disparities

TL;DR

This work tackles bias in disease progression models caused by health disparities across initial severity, progression rate, and visit frequency. It introduces an interpretable Bayesian progression model with group-specific parameters and a Poisson visit process, proving identifiability of all parameters. Theoretical results show that ignoring disparities biases severity estimates, and synthetic experiments validate parameter recovery and bias findings. Application to NewYork-Presbyterian heart failure data reveals higher severity and distinct disparity patterns in non-white groups, with disparities accounting materially shift high-risk classifications. Overall, the study provides a disparity-aware framework for more accurate and equitable disease progression inference that can generalize to other chronic diseases.

Abstract

Disease progression models are widely used to inform the diagnosis and treatment of many progressive diseases. However, a significant limitation of existing models is that they do not account for health disparities that can bias the observed data. To address this, we develop an interpretable Bayesian disease progression model that captures three key health disparities: certain patient populations may (1) start receiving care only when their disease is more severe, (2) experience faster disease progression even while receiving care, or (3) receive follow-up care less frequently conditional on disease severity. We show theoretically and empirically that failing to account for any of these disparities can result in biased estimates of severity (e.g., underestimating severity for disadvantaged groups). On a dataset of heart failure patients, we show that our model can identify groups that face each type of health disparity, and that accounting for these disparities while inferring disease severity meaningfully shifts which patients are considered high-risk.

Paper Structure

This paper contains 49 sections, 7 theorems, 21 equations, 9 figures, 3 tables.

Key Result

Theorem 1

All model parameters are identified by the observed data distribution $P({X_t}, D_t \mid A)$.

Figures (9)

  • Figure 1: Disease progression generative model. Plate diagram captures $N$ patients over $T$ timesteps. Shaded nodes indicate observed features: demographics $A^{(i)}$, visit indicator $D_t^{(i)}$, and symptoms ${X_t}^{(i)}$ (only observed when $D_t^{(i)} = 1$). Unshaded nodes indicate latent variables: a patient's initial severity ${Z_0}^{(i)}$, rate of progression ${R}^{(i)}$, and severity ${Z_t}^{(i)}$. Red arrows indicate dependencies capturing health disparities.
  • Figure 2: Well-calibrated severity estimates. Each dot shows the mean true vs. mean recovered severity values for one group in a given simulation trial. Groups depicted in red tend to be underserved compared to groups depicted in blue. Our full model produces accurate and well-calibrated severity estimates (estimates lie near dotted $y=x$ line).
  • Figure 3: Inferred model parameters with 95% confidence intervals. Shared parameters (top) are consistent with medical knowledge of heart failure progression. Group-specific parameters (bottom) are plotted as differences compared to White patients, so confidence intervals that are non-overlapping with 0 (colored in purple) indicate significant racial/ethnic differences in parameters.
  • Figure 4: Accounting for disparities leads to less biased severity estimates. We visualize the improvement of our full model (blue) over one that does not account for disparities but is otherwise the same (yellow) in two ways. On the top, we show each group's average difference from the overall mean severity, normalized by the overall standard deviation of severity. On the bottom, we capture the portion of each group that is identified as "high-risk" (top quartile of disease severity).
  • Figure S1: We can calculate $\mathbb{E}[{Z_t} \mid E_t=1]$ by taking the expectation over the blue region, with each point having probability $p(z)f(x)$.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Definition 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Lemma 6
  • Lemma 7
  • Lemma 8