Policy Learning for Optimal Dynamic Treatment Regimes with Observational Data

Shosei Sakaguchi

Policy Learning for Optimal Dynamic Treatment Regimes with Observational Data

Shosei Sakaguchi

Abstract

Public policies and medical interventions often involve dynamic treatment assignments, in which individuals receive a sequence of interventions over multiple stages. We study the statistical learning of optimal dynamic treatment regimes (DTRs) that determine the optimal treatment assignment for each individual at each stage based on their evolving history. We propose a novel, doubly robust, classification-based method for learning the optimal DTR from observational data under the sequential ignorability assumption. The method proceeds via backward induction: at each stage, it constructs and maximizes an augmented inverse probability weighting (AIPW) estimator of the policy value function to learn the optimal stage-specific policy. We show that the resulting DTR achieves an optimal convergence rate of $n^{-1/2}$ for welfare regret under mild convergence conditions on estimators of the nuisance components.

Policy Learning for Optimal Dynamic Treatment Regimes with Observational Data

Abstract

for welfare regret under mild convergence conditions on estimators of the nuisance components.

Paper Structure (20 sections, 12 theorems, 124 equations, 1 figure, 5 tables, 1 algorithm)

This paper contains 20 sections, 12 theorems, 124 equations, 1 figure, 5 tables, 1 algorithm.

Introduction
Setup
Dynamic Treatment Framework
Dynamic Treatment Choice Problem
Learning of the Optimal DTR
Fitted Q-evaluation and Backward Induction
Learning of the Optimal DTRs through Backward Induction
Statistical Properties
Existing Approach
Simulation Study
Empirical Application
Conclusion
Proof of Theorem \ref{['thm:main_theorem_backward']}
Preliminary Results and Proofs of Lemmas \ref{['lem:optimality_backward_induction']}, \ref{['lem:entropy_integral_bound']}, \ref{['lem:helpful_lemma']}, \ref{['lem:bound_influence_difference_function']}, and \ref{['lem:asymptotic_estimated_policy_difference_function']}
Preliminary Results and Proofs of Lemmas \ref{['lem:optimality_backward_induction']}, \ref{['lem:entropy_integral_bound']}, and \ref{['lem:bound_influence_difference_function']}
...and 5 more sections

Key Result

Lemma 3.1

Under Assumptions asm:sequential independence, asm:overlap, and asm:first-best, $\pi^{\ast,B}$ is the optimal DTR over $\Pi$; i.e.,

Figures (1)

Figure 1: Estimated DTR for class-type allocation in grades K and 1

Theorems & Definitions (27)

Example 2.1: Optimal Starting/Stopping Problem
Lemma 3.1
proof
Definition 4.1
Lemma 4.2
proof
Theorem 4.3
proof
Theorem 5.1
proof
...and 17 more

Policy Learning for Optimal Dynamic Treatment Regimes with Observational Data

Abstract

Policy Learning for Optimal Dynamic Treatment Regimes with Observational Data

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (27)