Table of Contents
Fetching ...

Nuisance Function Tuning and Sample Splitting for Optimally Estimating a Doubly Robust Functional

Sean McGrath, Rajarshi Mukherjee

TL;DR

It is verified that by using undersmoothing and sample splitting techniques when constructing nuisance function estimators, one can achieve minimax rates of convergence in all H¨older smoothness classes of the nuisance functions (i.e. the propensity score and outcome regression) provided that the marginal density of the covariates is sufficiently regular.

Abstract

Estimators of doubly robust functionals typically rely on estimating two complex nuisance functions, such as the propensity score and conditional outcome mean for the average treatment effect functional. We consider the problem of how to estimate nuisance functions to obtain optimal rates of convergence for a doubly robust nonparametric functional that has witnessed applications across the causal inference and conditional independence testing literature. For several plug-in estimators and a first-order bias-corrected estimator, we illustrate the interplay between different tuning parameter choices for the nuisance function estimators and sample splitting strategies on the optimal rate of estimating the functional of interest. For each of these estimators and each sample splitting strategy, we show the necessity to either undersmooth or oversmooth the nuisance function estimators under low regularity conditions to obtain optimal rates of convergence for the functional of interest. Unlike the existing literature, we show that plug-in and first-order bias-corrected estimators can achieve minimax rates of convergence across all Hölder smoothness classes of the nuisance functions by careful combinations of sample splitting and nuisance function tuning strategies. We complement these results with numerical simulations illustrating the impact of different nuisance function tuning and sample splitting strategies.

Nuisance Function Tuning and Sample Splitting for Optimally Estimating a Doubly Robust Functional

TL;DR

It is verified that by using undersmoothing and sample splitting techniques when constructing nuisance function estimators, one can achieve minimax rates of convergence in all H¨older smoothness classes of the nuisance functions (i.e. the propensity score and outcome regression) provided that the marginal density of the covariates is sufficiently regular.

Abstract

Estimators of doubly robust functionals typically rely on estimating two complex nuisance functions, such as the propensity score and conditional outcome mean for the average treatment effect functional. We consider the problem of how to estimate nuisance functions to obtain optimal rates of convergence for a doubly robust nonparametric functional that has witnessed applications across the causal inference and conditional independence testing literature. For several plug-in estimators and a first-order bias-corrected estimator, we illustrate the interplay between different tuning parameter choices for the nuisance function estimators and sample splitting strategies on the optimal rate of estimating the functional of interest. For each of these estimators and each sample splitting strategy, we show the necessity to either undersmooth or oversmooth the nuisance function estimators under low regularity conditions to obtain optimal rates of convergence for the functional of interest. Unlike the existing literature, we show that plug-in and first-order bias-corrected estimators can achieve minimax rates of convergence across all Hölder smoothness classes of the nuisance functions by careful combinations of sample splitting and nuisance function tuning strategies. We complement these results with numerical simulations illustrating the impact of different nuisance function tuning and sample splitting strategies.
Paper Structure (29 sections, 20 theorems, 17 equations, 5 figures, 3 tables)

This paper contains 29 sections, 20 theorems, 17 equations, 5 figures, 3 tables.

Key Result

Theorem 1

Suppose that double sample splitting is performed. Under the assumptions given in Section sec: motivation, the following statements hold:

Figures (5)

  • Figure 1: Rate-optimality of the estimators when using double sample splitting (top panel), single sample splitting (middle panel), and no sample splitting (bottom panel). The dashed lines illustrate the region of the parameter space where the respective estimators are rate optimal when using optimal resolution choices.
  • Figure 2: Optimally tuning the resolutions for $\hat{\psi}_{k_1, k_2}^{\mathrm{INT}}$, $\hat{\psi}_{k_1, k_2}^{\mathrm{MC}}$, and $\hat{\psi}_{k_1, k_2}^{\mathrm{IF}}$ with double sample splitting (SS).
  • Figure 3: Optimally tuning the resolutions for $\hat{\psi}_{k_1, k_2}^{\mathrm{INT}}$, $\hat{\psi}_{k_1, k_2}^{\mathrm{MC}}$, $\hat{\psi}_{k_1, k_2}^{\mathrm{IF}}$, and $\hat{\psi}_{k}^{\mathrm{NR}}$ with single sample splitting (SS).
  • Figure 4: Optimally tuning the resolutions for $\hat{\psi}_{k_1, k_2}^{\mathrm{INT}}$, $\hat{\psi}_{k_1, k_2}^{\mathrm{MC}}$, $\hat{\psi}_{k_1, k_2}^{\mathrm{IF}}$ and $\hat{\psi}_{k}^{\mathrm{NR}}$ with no sample splitting (SS).
  • Figure 5: Conditional mean function $\mu$ in the low, medium, and high regularity regimes in the simulations. Grey points show 5,000 samples from the distribution of $A$ and $Y$.

Theorems & Definitions (35)

  • Theorem 1
  • Corollary 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Corollary 2
  • Corollary 3
  • Remark 4
  • Remark 5
  • Theorem 2
  • ...and 25 more