Evaluating and Learning Optimal Dynamic Treatment Regimes under Truncation by Death
Sihyung Park, Wenbin Lu, Shu Yang
TL;DR
This paper tackles evaluating and learning dynamic treatment regimes when truncation by death makes outcomes ill-defined, by adopting a principal-stratification target—the always-survivor value function $V_{\text{AS}}(\pi)$. It derives the efficient influence function and semiparametric efficiency bound for multi-stage settings, and proposes a multiply robust estimator that remains consistent under several nuisance-model misspecifications, with an off-policy evaluation and learning framework, including cross-fitting. The authors demonstrate robustness and efficiency through simulations and apply the method to MIMIC-III sepsis data, showing improved policy performance and interpretable determinants like age and weight in treatment decisions. Collectively, the work enables reliable, personalized decision-making in critical care settings where death truncates follow-up, and lays groundwork for extending to more general multi-decision-point regimes and time-to-death analyses.
Abstract
Truncation by death, a prevalent challenge in critical care, renders traditional dynamic treatment regime (DTR) evaluation inapplicable due to ill-defined potential outcomes. We introduce a principal stratification-based method, focusing on the always-survivor value function. We derive a semiparametrically efficient, multiply robust estimator for multi-stage DTRs, demonstrating its robustness and efficiency. Empirical validation and an application to electronic health records showcase its utility for personalized treatment optimization.
