Valid post-selection inference in Robust Q-learning

Jeremiah Jones; Ashkan Ertefaie; James R. McKay; David W. Oslin; Robert L. Strawderman

Valid post-selection inference in Robust Q-learning

Jeremiah Jones, Ashkan Ertefaie, James R. McKay, David W. Oslin, Robert L. Strawderman

TL;DR

This paper tackles the challenge of valid post selection inference in multi stage Robust Q learning for adaptive treatment strategies by adapting the Universal Post Selection Inference framework to a semiparametric, two stage setting.It develops cross fitting based nuisance estimation together with a perturbation bootstrap to produce confidence regions for the selected stage wise contrast parameters that hold uniformly over sparse model spaces.A key contribution is extending UPoSI to population targets in Robust Q learning and further adapting the methodology to conditional design targets to obtain tighter inference when design is treated as fixed.Theoretical results show asymptotic validity of the UPoSI regions and bootstrap, with simulations and an ExTENd data application demonstrating favorable finite sample performance and practical utility for adaptive treatment strategy estimation.

Abstract

Q-learning facilitates the development of an optimal adaptive treatment strategy through stagewise regression on a pre-specified set of tailoring variables and confounders. Semiparametric robust Q-learning eliminates the residual confounding that can occur when parametric working models for confounding influences are misspecified. However, in the presence of many potential tailoring variables, constructing an optimal adaptive treatment strategy using either approach may lead to including extraneous variables that contribute little or no benefit while increasing implementation costs, thereby placing an undue burden on patients. Using data-driven selection processes to identify a smaller set of informative prognostic factors is straightforward; however, proper statistical inference must account for this selection process. In this paper, we adapt the Universal Post-Selection Inference (UPoSI) procedure to the semiparametric Robust Q-learning method. UPoSI, introduced for use with linear models, allows for very general variable selection mechanisms. Our approach addresses the unique challenges stemming from the use of UPoSI with semiparametric multistage decision methods. Theoretical and simulation results demonstrate the validity of the proposed confidence regions. We illustrate our proposed methods through an application to adaptive treatment strategy estimation for substance abuse.

Valid post-selection inference in Robust Q-learning

TL;DR

Abstract

Paper Structure (45 sections, 25 theorems, 174 equations, 3 figures, 3 tables)

This paper contains 45 sections, 25 theorems, 174 equations, 3 figures, 3 tables.

Introduction
Notation
Submodel Selection in Robust Q-learning
Robust Q-learning with Fixed Submodels
Estimation via Cross-fitting
The Perturbation Bootstrap with Cross-fitting
UPoSI for Population Parameters
Adaptation of UPoSI to Robust Q-learning
Discussion of Population-level Inference
Conditioning on the Design
Conditional vs. Population Inference
Defining the Conditional Targets
Confidence Regions for Conditional Targets
Theory for Conditional Targets
Simulation Study
...and 30 more sections

Key Result

Theorem 5.1

Under assump:boundednessassump:bounded-modelassump:rate-assumpsassump:model-selection-consistencyassump:regularity presented in sec:uposi-theory, the confidence regions eq:uposi-1-dagger-stareq:uposi-2-dagger-star satisfy:

Figures (3)

Figure 1: Confidence interval performance for each method, grouped by the stage of Robust Q-learning and sample size when using LAR. Top: Median confidence interval length; Bottom: False coverage rates.
Figure 2: Confidence interval performance for each method, grouped by the stage of Robust Q-learning and sample size when using FS. Top: Median confidence interval length; Bottom: False coverage rates.
Figure 3: Comparison of naive inference, selective inference (SI), and the proposed method (UPOSI) on the ExTENd data. Left: Discovery of nonzero coefficients. Right: Confidence interval length.

Theorems & Definitions (42)

Theorem 5.1: Validity of the confidence regions
Theorem 6.1: Validity of the conditional regions
Corollary 6.2: Validity of the conditional intervals
Theorem S1.1: Validity of the perturbation bootstrap
Corollary S1.2: Validity of the confidence intervals
Theorem S1.3: Conditional perturbation bootstrap
Theorem S4.1
proof
Lemma S7.1
proof
...and 32 more

Valid post-selection inference in Robust Q-learning

TL;DR

Abstract

Valid post-selection inference in Robust Q-learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (42)