Table of Contents
Fetching ...

Stage-Aware Learning for Dynamic Treatments

Hanwen Ye, Wenzhuo Zhou, Ruoqing Zhu, Annie Qu

TL;DR

A novel individualized learning method which estimates the DTR with a focus on prioritizing alignment between the observed treatment trajectory and the one obtained by the optimal regime across decision stages, and introduces the notion of stage importance scores along with an attention mechanism to explicitly account for heterogeneity among decision stages.

Abstract

Recent advances in dynamic treatment regimes (DTRs) facilitate the search for optimal treatments, which are tailored to individuals' specific needs and able to maximize their expected clinical benefits. However, existing algorithms relying on consistent trajectories, such as inverse probability weighting estimators (IPWEs), could suffer from insufficient sample size under optimal treatments and a growing number of decision-making stages, particularly in the context of chronic diseases. To address these challenges, we propose a novel individualized learning method which estimates the DTR with a focus on prioritizing alignment between the observed treatment trajectory and the one obtained by the optimal regime across decision stages. By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of IPWE-based methods. In particular, the proposed learning scheme builds a more general framework which includes the popular outcome weighted learning framework as a special case of ours. Moreover, we introduce the notion of stage importance scores along with an attention mechanism to explicitly account for heterogeneity among decision stages. We establish the theoretical properties of the proposed approach, including the Fisher consistency and finite-sample performance bound. Empirically, we evaluate the proposed method in extensive simulated environments and a real case study for the COVID-19 pandemic.

Stage-Aware Learning for Dynamic Treatments

TL;DR

A novel individualized learning method which estimates the DTR with a focus on prioritizing alignment between the observed treatment trajectory and the one obtained by the optimal regime across decision stages, and introduces the notion of stage importance scores along with an attention mechanism to explicitly account for heterogeneity among decision stages.

Abstract

Recent advances in dynamic treatment regimes (DTRs) facilitate the search for optimal treatments, which are tailored to individuals' specific needs and able to maximize their expected clinical benefits. However, existing algorithms relying on consistent trajectories, such as inverse probability weighting estimators (IPWEs), could suffer from insufficient sample size under optimal treatments and a growing number of decision-making stages, particularly in the context of chronic diseases. To address these challenges, we propose a novel individualized learning method which estimates the DTR with a focus on prioritizing alignment between the observed treatment trajectory and the one obtained by the optimal regime across decision stages. By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of IPWE-based methods. In particular, the proposed learning scheme builds a more general framework which includes the popular outcome weighted learning framework as a special case of ours. Moreover, we introduce the notion of stage importance scores along with an attention mechanism to explicitly account for heterogeneity among decision stages. We establish the theoretical properties of the proposed approach, including the Fisher consistency and finite-sample performance bound. Empirically, we evaluate the proposed method in extensive simulated environments and a real case study for the COVID-19 pandemic.
Paper Structure (41 sections, 7 theorems, 62 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 41 sections, 7 theorems, 62 equations, 8 figures, 9 tables, 1 algorithm.

Key Result

Proposition 1

Under Assumptions A-SUTVA-A-pos, the expected total reward under the target regime $\mathscr{D}$ with k number of matching stages equals and the corresponding maximizing regime $\Tilde{\mathscr{D}}_{(k)}$ is defined as

Figures (8)

  • Figure 1: An example to illustrate the curse of full-matching. We consider a sequential randomized trial with a static treatment regime and constant rewards. The assignment rates $P(A_1)$ and $P(A_2|A_1)$ specify the sequential probability of allocating treatments at corresponding decision stages. The total reward $R$, which is the sum of stage immediate rewards $r_1$ and $r_2$ (i.e., $R=r_1+r_2$), evaluates the performance of each treatment arm.
  • Figure 1: Architecture of stage importance scores searching network. The stage importance scores are treated as the attention weights applied on the patients' historical information by the LSTM layer hochreiter1997long, and are later estimated by minimizing the MSE between the observed and surrogate total rewards after the fully-connected (FC) layers transformation.
  • Figure 2: Sensitivity plots of estimated total rewards under four function settings against sample sizes. The number of decision stages is set to 5 and there are no important stages for this example.
  • Figure 3: Boxplots of the estimated total rewards of listed methods versus the number of important stages when $n=500$, $T=10$, and the optimal treatment rule is linear and homogeneous.
  • Figure 4: Boxplots of the estimated number of inpatient days by the number of decision stages.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Proposition 1
  • Remark 1
  • Remark 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 4
  • Lemma 5
  • Lemma 6