Table of Contents
Fetching ...

Bayesian Counterfactual Prediction Models for HIV Care Retention with Incomplete Outcome and Covariate Information

Arman Oganisian, Joseph Hogan, Edwin Sang, Allison DeLong, Ben Mosong, Hamish Fraser, Ann Mwangi

TL;DR

The urgent need for data-driven decision support in HIV care is addressed by applying the all-in-one approach to EHR from the Academic Model Providing Access to Healthcare (AMPATH) - a consortium of clinics that treat HIV in Western Kenya.

Abstract

Like many chronic diseases, human immunodeficiency virus (HIV) is managed over time at regular clinic visits. At each visit, patient features are assessed, treatments are prescribed, and a subsequent visit is scheduled. There is a need for data-driven methods for both predicting retention and recommending scheduling decisions that optimize retention. Prediction models can be useful for estimating retention rates across a range of scheduling options. However, training such models with electronic health records (EHR) involves several complexities. First, formal causal inference methods are needed to adjust for observed confounding when estimating retention rates under counterfactual scheduling decisions. Second, competing events such as death preclude retention, while censoring events render retention missing. Third, inconsistent monitoring of features such as viral load and CD4 count lead to covariate missingness. This paper presents an all-in-one approach for both predicting HIV retention and optimizing scheduling while accounting for these complexities. We formulate and identify causal retention estimands in terms of potential return-time under a hypothetical scheduling decision. Flexible Bayesian approaches are used to model the observed return-time distribution while accounting for competing and censoring events and form posterior point and uncertainty estimates for these estimands. We address the urgent need for data-driven decision support in HIV care by applying our method to EHR from the Academic Model Providing Access to Healthcare (AMPATH) - a consortium of clinics that treat HIV in Western Kenya.

Bayesian Counterfactual Prediction Models for HIV Care Retention with Incomplete Outcome and Covariate Information

TL;DR

The urgent need for data-driven decision support in HIV care is addressed by applying the all-in-one approach to EHR from the Academic Model Providing Access to Healthcare (AMPATH) - a consortium of clinics that treat HIV in Western Kenya.

Abstract

Like many chronic diseases, human immunodeficiency virus (HIV) is managed over time at regular clinic visits. At each visit, patient features are assessed, treatments are prescribed, and a subsequent visit is scheduled. There is a need for data-driven methods for both predicting retention and recommending scheduling decisions that optimize retention. Prediction models can be useful for estimating retention rates across a range of scheduling options. However, training such models with electronic health records (EHR) involves several complexities. First, formal causal inference methods are needed to adjust for observed confounding when estimating retention rates under counterfactual scheduling decisions. Second, competing events such as death preclude retention, while censoring events render retention missing. Third, inconsistent monitoring of features such as viral load and CD4 count lead to covariate missingness. This paper presents an all-in-one approach for both predicting HIV retention and optimizing scheduling while accounting for these complexities. We formulate and identify causal retention estimands in terms of potential return-time under a hypothetical scheduling decision. Flexible Bayesian approaches are used to model the observed return-time distribution while accounting for competing and censoring events and form posterior point and uncertainty estimates for these estimands. We address the urgent need for data-driven decision support in HIV care by applying our method to EHR from the Academic Model Providing Access to Healthcare (AMPATH) - a consortium of clinics that treat HIV in Western Kenya.

Paper Structure

This paper contains 9 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) Some possible patient trajectories in our data. Time zero (start of followup) is at time of the initial enrollment visit at $V_1=0$. At the enrollment visit, available history is used to schedule a return visit $S_1$ weeks later (measured from $V_1$). The actual waiting time until return is $W_1$. Notice subject 1 returns for visit 2 $W_1$ weeks after $V_1$, at time $V_2$. This is after their scheduled return time, $S_1$, thus they are delayed by $W_1-S_1$ weeks. At time $V_2$, their third visit is scheduled $S_2$ weeks later. Again, they are late and arrive $W_2$ weeks later at time $V_3$. This proceeds until a subject dies at time $T$, before their next scheduled return time. Subject 2, on the other hand returns for their second visit $W_1$ weeks after their initial visit, at time $V_2$. They returned earlier than their scheduled return time of $S_1$ and thus their delay time $W_1-S_1$ is negative. . At their third visit, Subject 2 was scheduled to return in $S_3$ weeks, but was censored at time $C$ (either because they were transferred out of care or due to end of data cut). (b) $\Delta$-Retention, defined as $Y_k(\Delta) = I( W_k - S_k < \Delta, \delta=1)$, is missing for subjects censored before time $V_k+S_k+\Delta$, but observed for subjects censored after that time.
  • Figure 2: Unadjusted estimates of the hazard of return time from visit $j=1$, stratified by scheduled time. The Bayesian model is able to capture clumping around the scheduled visit time as seen by the spike in the hazard estimate at the dashed vertical line. Finally, as seen from weeks 50-150 in the second panel and around week 100 in the third panel, the gAR1 prior smooths the hazard estimates even as the frequentist point estimates are jumpy due to low numbers of patients at risk.
  • Figure 3: For all plots, $j=1$ and $\Delta=90/7\approx13$. Left: Posterior mean and 95% credible interval of $\Psi_j^s(h_{ji};\Delta)$ for each subject $i$ in the test set under their scheduled return time $s=s_i$. Middle: For each subject, this plot displays the posterior mean of $\Psi_j^s(h_{ji};\Delta)$ on the x-axis along with with the width of the credible interval of $\Psi_j^s(h_{ji}$. Right: Each subject's posterior PMF of the optimal scheduling time representing in a machina triangle. $P( s_j^*(h_j) = 2 \mid \mathcal{D})$ is on the x-axis and $P( s_j^*(h_j) = 4 \mid \mathcal{D})$ on the y-axis. The red point in the middle represents maximal uncertainty with each option having 1/3 posterior probability of being the optimal option. Patients closest to vertex $(0,0)$, $(0,1)$, and $(1,0)$ are those with option 8, 4, and 2 as the posterior mode optimal rule, respectively.
  • Figure 4: Left: For $j=1$ and $\Delta=90/7$, each point represents $\Psi_j^s(h_{ji};\Delta)$ for each subject $i$ in the test set under their scheduled return time $s=s_i$ under BART vs. the formulated Bayesian Transition Model. Due to low censoring rates, the predictions produced by the two approaches are quite consistent, as shown by the scattering of points around the 45-degree line. Right: For one subject, $i$, in the test set, BMT produced estimates of $\Psi_1^s(h_{1i};\Delta)$ for various $\Delta$ under each possible scheduling decision $s\in\{2,4,8\}$. This plot illustrates that ability of our Bayesian Transition Model to produce point and interval estimates under a variety of retention definitions, $\Delta$, and scheduling options without need to re-run the models. It illustrates that optimization for a single $\Delta$, while dominant in the HIV/AIDS literature, is not global and motivates future work on optimization via more general utility functions.