Asymptotic Inference for Multi-Stage Stationary Treatment Policy with Variable Selection

Daiqi Gao; Yufeng Liu; Donglin Zeng

Asymptotic Inference for Multi-Stage Stationary Treatment Policy with Variable Selection

Daiqi Gao, Yufeng Liu, Donglin Zeng

TL;DR

This paper addresses inference for multi-stage stationary treatment policies (MSTPs) in high-dimensional settings using offline, multi-stage data. It develops an estimation framework based on augmented inverse probability weighted estimation (AIPWE) with an L1 penalty to learn sparse, interpretable policy parameters under an L2 constraint, and then constructs valid inference via a two-step decorrelated one-step estimator grounded in Neyman orthogonality. The authors prove asymptotic normality of the one-step policy estimators and provide practical implementation with stabilized importance weights, nuisance Q-function estimation, and nonconvex optimization, supported by simulations and a real-data study on diabetes management. The results show that the approach yields sparse, near-optimal policies with valid per-parameter confidence intervals, enabling reliable interpretation and deployment in precision medicine contexts.

Abstract

Dynamic treatment regimes or policies are a sequence of decision functions over multiple stages that are tailored to individual features. One important class of treatment policies in practice, namely multi-stage stationary treatment policies, prescribes treatment assignment probabilities using the same decision function across stages, where the decision is based on the same set of features consisting of time-evolving variables (e.g., routinely collected disease biomarkers). Although there has been extensive literature on constructing valid inference for the value function associated with dynamic treatment policies, little work has focused on the policies themselves, especially in the presence of high-dimensional feature variables. We aim to fill the gap in this work. Specifically, we first estimate the multi-stage stationary treatment policy using an augmented inverse probability weighted estimator for the value function to increase asymptotic efficiency, and further apply a penalty to select important feature variables. We then construct one-step improvements of the policy parameter estimators for valid inference. Theoretically, we show that the improved estimators are asymptotically normal, even if nuisance parameters are estimated at a slow convergence rate and the dimension of the feature variables increases with the sample size. Our numerical studies demonstrate that the proposed method estimates a sparse policy with a near-optimal value function and conducts valid inference for the policy parameters.

Asymptotic Inference for Multi-Stage Stationary Treatment Policy with Variable Selection

TL;DR

Abstract

Paper Structure (26 sections, 10 theorems, 168 equations, 9 figures, 3 tables, 4 algorithms)

This paper contains 26 sections, 10 theorems, 168 equations, 9 figures, 3 tables, 4 algorithms.

Introduction
Methodology
Estimate Policy Parameter with Variable Selection
Statistical Inference for Sparse High Dimensional Parameters
Implementation
Stabilizing the importance sampling weight.
Estimating the nuisance parameters.
Optimizing constrained nonconvex nondifferentiable functions.
Theoretical Results
Simulation Study
Real Data Analysis
Discussion
Implementation Details
Estimating the Nuisance Parameters
Optimization with $L_1$ Penalty and $L_2$ Constraint
...and 11 more sections

Key Result

Lemma 3.1

Under Assumptions asp:nuca-asp:V.sc, when $\lambda_{\bm{\theta}} \simeq \sqrt{\log d / n}$, we have

Figures (9)

Figure 1: The first row presents the average reward of MSTP, PEARL, and the random policy in Scenario 1. The second and third rows show the MAD and coverage probability of $\theta_1$. The columns correspond to different values of $T$. The dotted line indicates the nominal coverage probability for $\theta_1$.
Figure 2: The first row presents the average reward of MSTP, PEARL, and the random policy in Scenario 2. The second and third rows show the MAD and coverage probability of $\theta_1$. The columns correspond to different values of $T$. The dotted line indicates the nominal coverage probability for $\theta_1$.
Figure 3: The MAD and coverage probability of $\theta_2$ and their average $\theta_{3:d}$ in Scenario 1. The columns correspond to different values of $T$, and the rows correspond to different metrics. The dotted line indicates the nominal coverage probability.
Figure 4: The MAD and coverage probability of $\theta_2$ and their average $\theta_{3:d}$ in Scenario 2. The columns correspond to different values of $T$, and the rows correspond to different metrics. The dotted line indicates the nominal coverage probability.
Figure 5: Comparison of the coverage probability of the CIs for the sparse estimator $\hat{\bm{\theta}}$ and the one-step estimator $\tilde{\bm{\theta}}$. The columns correspond to different values of $T$, and the rows correspond to different simulation scenarios.
...and 4 more figures

Theorems & Definitions (20)

Lemma 3.1
Theorem 3.2
Definition D.1: Neyman orthogonality
Lemma D.1
proof
Lemma D.2
proof
Lemma D.3: Concentration of the gradient and Hessian
proof
Lemma D.4: Central limit theorem for the score function
...and 10 more

Asymptotic Inference for Multi-Stage Stationary Treatment Policy with Variable Selection

TL;DR

Abstract

Asymptotic Inference for Multi-Stage Stationary Treatment Policy with Variable Selection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (20)