Table of Contents
Fetching ...

Valid post-selection inference in Robust Q-learning

Jeremiah Jones, Ashkan Ertefaie, James R. McKay, David W. Oslin, Robert L. Strawderman

TL;DR

This paper tackles the challenge of valid post selection inference in multi stage Robust Q learning for adaptive treatment strategies by adapting the Universal Post Selection Inference framework to a semiparametric, two stage setting.It develops cross fitting based nuisance estimation together with a perturbation bootstrap to produce confidence regions for the selected stage wise contrast parameters that hold uniformly over sparse model spaces.A key contribution is extending UPoSI to population targets in Robust Q learning and further adapting the methodology to conditional design targets to obtain tighter inference when design is treated as fixed.Theoretical results show asymptotic validity of the UPoSI regions and bootstrap, with simulations and an ExTENd data application demonstrating favorable finite sample performance and practical utility for adaptive treatment strategy estimation.

Abstract

Q-learning facilitates the development of an optimal adaptive treatment strategy through stagewise regression on a pre-specified set of tailoring variables and confounders. Semiparametric robust Q-learning eliminates the residual confounding that can occur when parametric working models for confounding influences are misspecified. However, in the presence of many potential tailoring variables, constructing an optimal adaptive treatment strategy using either approach may lead to including extraneous variables that contribute little or no benefit while increasing implementation costs, thereby placing an undue burden on patients. Using data-driven selection processes to identify a smaller set of informative prognostic factors is straightforward; however, proper statistical inference must account for this selection process. In this paper, we adapt the Universal Post-Selection Inference (UPoSI) procedure to the semiparametric Robust Q-learning method. UPoSI, introduced for use with linear models, allows for very general variable selection mechanisms. Our approach addresses the unique challenges stemming from the use of UPoSI with semiparametric multistage decision methods. Theoretical and simulation results demonstrate the validity of the proposed confidence regions. We illustrate our proposed methods through an application to adaptive treatment strategy estimation for substance abuse.

Valid post-selection inference in Robust Q-learning

TL;DR

This paper tackles the challenge of valid post selection inference in multi stage Robust Q learning for adaptive treatment strategies by adapting the Universal Post Selection Inference framework to a semiparametric, two stage setting.It develops cross fitting based nuisance estimation together with a perturbation bootstrap to produce confidence regions for the selected stage wise contrast parameters that hold uniformly over sparse model spaces.A key contribution is extending UPoSI to population targets in Robust Q learning and further adapting the methodology to conditional design targets to obtain tighter inference when design is treated as fixed.Theoretical results show asymptotic validity of the UPoSI regions and bootstrap, with simulations and an ExTENd data application demonstrating favorable finite sample performance and practical utility for adaptive treatment strategy estimation.

Abstract

Q-learning facilitates the development of an optimal adaptive treatment strategy through stagewise regression on a pre-specified set of tailoring variables and confounders. Semiparametric robust Q-learning eliminates the residual confounding that can occur when parametric working models for confounding influences are misspecified. However, in the presence of many potential tailoring variables, constructing an optimal adaptive treatment strategy using either approach may lead to including extraneous variables that contribute little or no benefit while increasing implementation costs, thereby placing an undue burden on patients. Using data-driven selection processes to identify a smaller set of informative prognostic factors is straightforward; however, proper statistical inference must account for this selection process. In this paper, we adapt the Universal Post-Selection Inference (UPoSI) procedure to the semiparametric Robust Q-learning method. UPoSI, introduced for use with linear models, allows for very general variable selection mechanisms. Our approach addresses the unique challenges stemming from the use of UPoSI with semiparametric multistage decision methods. Theoretical and simulation results demonstrate the validity of the proposed confidence regions. We illustrate our proposed methods through an application to adaptive treatment strategy estimation for substance abuse.
Paper Structure (45 sections, 25 theorems, 174 equations, 3 figures, 3 tables)

This paper contains 45 sections, 25 theorems, 174 equations, 3 figures, 3 tables.

Key Result

Theorem 5.1

Under assump:boundednessassump:bounded-modelassump:rate-assumpsassump:model-selection-consistencyassump:regularity presented in sec:uposi-theory, the confidence regions eq:uposi-1-dagger-stareq:uposi-2-dagger-star satisfy:

Figures (3)

  • Figure 1: Confidence interval performance for each method, grouped by the stage of Robust Q-learning and sample size when using LAR. Top: Median confidence interval length; Bottom: False coverage rates.
  • Figure 2: Confidence interval performance for each method, grouped by the stage of Robust Q-learning and sample size when using FS. Top: Median confidence interval length; Bottom: False coverage rates.
  • Figure 3: Comparison of naive inference, selective inference (SI), and the proposed method (UPOSI) on the ExTENd data. Left: Discovery of nonzero coefficients. Right: Confidence interval length.

Theorems & Definitions (42)

  • Theorem 5.1: Validity of the confidence regions
  • Theorem 6.1: Validity of the conditional regions
  • Corollary 6.2: Validity of the conditional intervals
  • Theorem S1.1: Validity of the perturbation bootstrap
  • Corollary S1.2: Validity of the confidence intervals
  • Theorem S1.3: Conditional perturbation bootstrap
  • Theorem S4.1
  • proof
  • Lemma S7.1
  • proof
  • ...and 32 more