Table of Contents
Fetching ...

Optimal Nuisance Function Tuning for Estimating a Doubly Robust Functional under Proportional Asymptotics

Sean McGrath, Debarghya Mukherjee, Rajarshi Mukherjee, Zixiao Jolene Wang

TL;DR

The paper addresses estimation of the ECC functional under proportional asymptotics where standard DML nuisance-estimation rates fail. It develops debiased, ridge-based estimators for three ECC estimators (INT, NR, DR) under two- and three-split sample strategies, proving $\sqrt{n}$-consistency and deriving exact asymptotic variances. A key finding is that tuning nuisance functions to minimize prediction error does not always minimize ECC variance, and thus inference should target variance reduction for the ECC estimator. The work leverages random matrix theory and the Marchenko-Pastur law to characterize asymptotic biases and variances, and demonstrates practical implications via extensive simulations and a parametric bootstrap procedure. This provides concrete guidance for robust inference in high-dimensional, over-parameterized settings where nuisance estimators cannot be consistently estimated.

Abstract

In this paper, we explore the asymptotically optimal tuning parameter choice in ridge regression for estimating nuisance functions of a statistical functional that has recently gained prominence in conditional independence testing and causal inference. Given a sample of size $n$, we study estimators of the Expected Conditional Covariance (ECC) between variables $Y$ and $A$ given a high-dimensional covariate $X \in \mathbb{R}^p$. Under linear regression models for $Y$ and $A$ on $X$ and the proportional asymptotic regime $p/n \to c \in (0, \infty)$, we evaluate three existing ECC estimators and two sample splitting strategies for estimating the required nuisance functions. Since no consistent estimator of the nuisance functions exists in the proportional asymptotic regime without imposing further structure on the problem, we first derive debiased versions of the ECC estimators that utilize the ridge regression nuisance function estimators. We show that our bias correction strategy yields $\sqrt{n}$-consistent estimators of the ECC across different sample splitting strategies and estimator choices. We then derive the asymptotic variances of these debiased estimators to illustrate the nuanced interplay between the sample splitting strategy, estimator choice, and tuning parameters of the nuisance function estimators for optimally estimating the ECC. Our analysis reveals that prediction-optimal tuning parameters (i.e., those that optimally estimate the nuisance functions) may not lead to the lowest asymptotic variance of the ECC estimator -- thereby demonstrating the need to be careful in selecting tuning parameters based on the final goal of inference. Finally, we verify our theoretical results through extensive numerical experiments.

Optimal Nuisance Function Tuning for Estimating a Doubly Robust Functional under Proportional Asymptotics

TL;DR

The paper addresses estimation of the ECC functional under proportional asymptotics where standard DML nuisance-estimation rates fail. It develops debiased, ridge-based estimators for three ECC estimators (INT, NR, DR) under two- and three-split sample strategies, proving -consistency and deriving exact asymptotic variances. A key finding is that tuning nuisance functions to minimize prediction error does not always minimize ECC variance, and thus inference should target variance reduction for the ECC estimator. The work leverages random matrix theory and the Marchenko-Pastur law to characterize asymptotic biases and variances, and demonstrates practical implications via extensive simulations and a parametric bootstrap procedure. This provides concrete guidance for robust inference in high-dimensional, over-parameterized settings where nuisance estimators cannot be consistently estimated.

Abstract

In this paper, we explore the asymptotically optimal tuning parameter choice in ridge regression for estimating nuisance functions of a statistical functional that has recently gained prominence in conditional independence testing and causal inference. Given a sample of size , we study estimators of the Expected Conditional Covariance (ECC) between variables and given a high-dimensional covariate . Under linear regression models for and on and the proportional asymptotic regime , we evaluate three existing ECC estimators and two sample splitting strategies for estimating the required nuisance functions. Since no consistent estimator of the nuisance functions exists in the proportional asymptotic regime without imposing further structure on the problem, we first derive debiased versions of the ECC estimators that utilize the ridge regression nuisance function estimators. We show that our bias correction strategy yields -consistent estimators of the ECC across different sample splitting strategies and estimator choices. We then derive the asymptotic variances of these debiased estimators to illustrate the nuanced interplay between the sample splitting strategy, estimator choice, and tuning parameters of the nuisance function estimators for optimally estimating the ECC. Our analysis reveals that prediction-optimal tuning parameters (i.e., those that optimally estimate the nuisance functions) may not lead to the lowest asymptotic variance of the ECC estimator -- thereby demonstrating the need to be careful in selecting tuning parameters based on the final goal of inference. Finally, we verify our theoretical results through extensive numerical experiments.

Paper Structure

This paper contains 45 sections, 11 theorems, 312 equations, 15 figures, 4 tables.

Key Result

Theorem 3.1

Consider the debiased version of the integral-based estimator $\hat{\theta}^{\rm INT, db}$ either obtained by a two-split approach (Equation eq:int_db_2sp) or a three-split approach (Equation eq:int_db_3sp). Assume $X_{ij}$ are iid subgaussian random variables with mean $0$, variance $1$, with unifo

Figures (15)

  • Figure 1: Variances of debiased estimators as functions of tuning parameter $\lambda$. Top row: Two-split ($N = 1000$ total, split into two subsamples of 500 each). Bottom row: Three-split ($N = 1500$ total, split into three subsamples of 500 each). Red line indicates the optimal $\lambda$ for estimator variance; Blue line is for prediction-optimal $\lambda$. The figures shown here are zoomed-in views; complete versions are included in the Appendix.
  • Figure 2: Bias of the non-debiased estimators (black dots) and the debiased versions (blue dots) in the setting with two splits and $c = 0.5$. The red lines illustrate the derived asymptotic bias of the estimators, with the horizontal line at zero.
  • Figure 3: Variance of the debiased estimators in the setting with two splits and $c = 0.5$. The red line indicates the value of $\lambda$ resulting in the smallest asymptotic variance for estimating $\rho$; The blue line is for the prediction-optimal $\lambda$.
  • Figure 4: Bias of the non-debiased estimators (black dots) and the debiased versions (blue dots) in the setting with two splits and $c = 2$. The red lines illustrate the derived asymptotic bias of the estimators, with the horizontal line at zero.
  • Figure 5: Variance of the debiased estimators in the setting with two splits and $c = 2$. The red line indicates the value of $\lambda$ resulting in the smallest asymptotic variance for estimating $\rho$; The blue line is for the prediction-optimal $\lambda$.
  • ...and 10 more figures

Theorems & Definitions (18)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Proposition D.1
  • proof
  • Proposition D.2
  • proof
  • Lemma E.1
  • proof
  • ...and 8 more