Table of Contents
Fetching ...

Evaluating and Learning Optimal Dynamic Treatment Regimes under Truncation by Death

Sihyung Park, Wenbin Lu, Shu Yang

TL;DR

This paper tackles evaluating and learning dynamic treatment regimes when truncation by death makes outcomes ill-defined, by adopting a principal-stratification target—the always-survivor value function $V_{\text{AS}}(\pi)$. It derives the efficient influence function and semiparametric efficiency bound for multi-stage settings, and proposes a multiply robust estimator that remains consistent under several nuisance-model misspecifications, with an off-policy evaluation and learning framework, including cross-fitting. The authors demonstrate robustness and efficiency through simulations and apply the method to MIMIC-III sepsis data, showing improved policy performance and interpretable determinants like age and weight in treatment decisions. Collectively, the work enables reliable, personalized decision-making in critical care settings where death truncates follow-up, and lays groundwork for extending to more general multi-decision-point regimes and time-to-death analyses.

Abstract

Truncation by death, a prevalent challenge in critical care, renders traditional dynamic treatment regime (DTR) evaluation inapplicable due to ill-defined potential outcomes. We introduce a principal stratification-based method, focusing on the always-survivor value function. We derive a semiparametrically efficient, multiply robust estimator for multi-stage DTRs, demonstrating its robustness and efficiency. Empirical validation and an application to electronic health records showcase its utility for personalized treatment optimization.

Evaluating and Learning Optimal Dynamic Treatment Regimes under Truncation by Death

TL;DR

This paper tackles evaluating and learning dynamic treatment regimes when truncation by death makes outcomes ill-defined, by adopting a principal-stratification target—the always-survivor value function . It derives the efficient influence function and semiparametric efficiency bound for multi-stage settings, and proposes a multiply robust estimator that remains consistent under several nuisance-model misspecifications, with an off-policy evaluation and learning framework, including cross-fitting. The authors demonstrate robustness and efficiency through simulations and apply the method to MIMIC-III sepsis data, showing improved policy performance and interpretable determinants like age and weight in treatment decisions. Collectively, the work enables reliable, personalized decision-making in critical care settings where death truncates follow-up, and lays groundwork for extending to more general multi-decision-point regimes and time-to-death analyses.

Abstract

Truncation by death, a prevalent challenge in critical care, renders traditional dynamic treatment regime (DTR) evaluation inapplicable due to ill-defined potential outcomes. We introduce a principal stratification-based method, focusing on the always-survivor value function. We derive a semiparametrically efficient, multiply robust estimator for multi-stage DTRs, demonstrating its robustness and efficiency. Empirical validation and an application to electronic health records showcase its utility for personalized treatment optimization.

Paper Structure

This paper contains 30 sections, 8 theorems, 44 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

For a fixed policy $\pi = (\pi_1, \pi_2)$, under assumptions A1 and A2-A5, the always-survivor value is identified as

Figures (5)

  • Figure 1: Value estimates under a fixed policy. The red line is drawn at the true value.
  • Figure 2:
  • Figure 3: Analysis of MIMIC-III database. (Left) Training set value was evaluated by nuisance model learned on training data, while test set value was computed by nuisance models learned on test data. Both values were evaluated on the test set. The left panel displays the discrepancy between the training set value function with training set optimal policy ($\hat{\beta}$) and the test set value function with test set optimal policy ($\beta^*$). The right panel compares the test set value function achieved by $\hat{\beta}$ and $\beta^*$. A solid red line is drawn at zero. (Right) Policy learned from the MR estimator.
  • Figure 4: Value estimates under a fixed policy across scenarios M1-M6. The red horizontal line is drawn at the true always-survivor value.
  • Figure 5: Value estimates from 500 independent simulated off-policy learning runs. Left panels of each plot show $\widehat{V}_\text{MR}(\hat{\beta}_\text{MR})$ and $\widehat{V}_\text{AIPW}(\hat{\beta}_\text{AIPW})$, while right panels show $V(\hat{\beta}_\text{MR})$ and $V(\hat{\beta}_\text{AIPW})$. The red horizontal line indicates the true optimal $V(\beta^*)$.

Theorems & Definitions (14)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Lemma 1
  • proof : Proof of Lemma 1
  • proof : Proof of Theorem 1
  • proof : Proof of Theorem 3
  • Lemma 2
  • ...and 4 more