Double Cross-fit Doubly Robust Estimators: Beyond Series Regression

Alec McClean; Sivaraman Balakrishnan; Edward H. Kennedy; Larry Wasserman

Double Cross-fit Doubly Robust Estimators: Beyond Series Regression

Alec McClean, Sivaraman Balakrishnan, Edward H. Kennedy, Larry Wasserman

TL;DR

The paper develops a rigorous theory for double cross-fit doubly robust (DCDR) estimators aimed at the ECC functional $\\psi_{ecc} = \\mathbb{E}\\{\\text{cov}(A,Y|X)\\}$, providing a structure-agnostic expansion and a spectral radius control that underpin fast rates under minimal assumptions. When nuisance components $\\pi$ and $\\mu$ possess Hölder smoothness, DCDR with linear smoothers such as local polynomial regression or k-NN attains $\\sqrt{n}$-consistency and asymptotic normality under mild conditions, with non-$\\sqrt{n}$ minimax rates $n^{-(\\alpha+\\beta)/d}$ or $n^{-(2\\alpha+2\\beta)/(2\\alpha+2\\beta+d)}$ depending on covariate-density knowledge. If the covariate density is known and smooth, covariate-density-adapted kernel regression yields minimax-optimal performance and can even exhibit a slower-than-$\\sqrt{n}$ central limit theorem, enabling valid inference in regimes where standard $\\sqrt{n}$ methods fail. Simulations corroborate the theory, illustrating when double cross-fitting and undersmoothing provide substantial improvements over the standard SCDR-MSE approach and demonstrating asymptotic normality for undersmoothed DCDR in the non-$\\sqrt{n}$ regime. The results offer practical guidance for constructing efficient, inference-ready estimators for causal functionals in settings with smooth nuisance components, and point to broader applicability to other mixed-bias functionals.

Abstract

Doubly robust estimators with cross-fitting have gained popularity in causal inference due to their favorable structure-agnostic error guarantees. However, when additional structure, such as Hölder smoothness, is available then more accurate "double cross-fit doubly robust" (DCDR) estimators can be constructed by splitting the training data and undersmoothing nuisance function estimators on independent samples. We study a DCDR estimator of the Expected Conditional Covariance, a functional of interest in causal inference and conditional independence testing. We first provide a structure-agnostic error analysis for the DCDR estimator with no assumptions on the nuisance functions or their estimators. Then, assuming the nuisance functions are Hölder smooth, but without assuming knowledge of the true smoothness level or the covariate density, we establish that DCDR estimators with several linear smoothers are $\sqrt{n}$-consistent and asymptotically normal under minimal conditions and achieve fast convergence rates in the non-$\sqrt{n}$ regime. When the covariate density and smoothnesses are known, we propose a minimax rate-optimal DCDR estimator based on undersmoothed kernel regression. Moreover, we show an undersmoothed DCDR estimator satisfies a slower-than-$\sqrt{n}$ central limit theorem, and that inference is possible even in the non-$\sqrt{n}$ regime. Finally, we support our theoretical results with simulations, providing intuition for double cross-fitting and undersmoothing, demonstrating where our estimator achieves $\sqrt{n}$-consistency while the usual "single cross-fit" estimator fails, and illustrating asymptotic normality for the undersmoothed DCDR estimator.

Double Cross-fit Doubly Robust Estimators: Beyond Series Regression

TL;DR

The paper develops a rigorous theory for double cross-fit doubly robust (DCDR) estimators aimed at the ECC functional

, providing a structure-agnostic expansion and a spectral radius control that underpin fast rates under minimal assumptions. When nuisance components

and

possess Hölder smoothness, DCDR with linear smoothers such as local polynomial regression or k-NN attains

-consistency and asymptotic normality under mild conditions, with non-

minimax rates

depending on covariate-density knowledge. If the covariate density is known and smooth, covariate-density-adapted kernel regression yields minimax-optimal performance and can even exhibit a slower-than-

central limit theorem, enabling valid inference in regimes where standard

methods fail. Simulations corroborate the theory, illustrating when double cross-fitting and undersmoothing provide substantial improvements over the standard SCDR-MSE approach and demonstrating asymptotic normality for undersmoothed DCDR in the non-

regime. The results offer practical guidance for constructing efficient, inference-ready estimators for causal functionals in settings with smooth nuisance components, and point to broader applicability to other mixed-bias functionals.

Abstract

-consistent and asymptotically normal under minimal conditions and achieve fast convergence rates in the non-

regime. When the covariate density and smoothnesses are known, we propose a minimax rate-optimal DCDR estimator based on undersmoothed kernel regression. Moreover, we show an undersmoothed DCDR estimator satisfies a slower-than-

central limit theorem, and that inference is possible even in the non-

regime. Finally, we support our theoretical results with simulations, providing intuition for double cross-fitting and undersmoothing, demonstrating where our estimator achieves

-consistency while the usual "single cross-fit" estimator fails, and illustrating asymptotic normality for the undersmoothed DCDR estimator.

Paper Structure (35 sections, 46 theorems, 245 equations, 7 figures, 1 algorithm)

This paper contains 35 sections, 46 theorems, 245 equations, 7 figures, 1 algorithm.

Introduction
Structure of the paper and our contributions
Notation
Setup and background
Assumptions and lower bounds on estimation rates
Plug-in, doubly robust, and higher-order estimators
Double cross-fit doubly robust estimator
Structure-agnostic analysis
Hölder smoothness and local averaging estimators
Local averaging estimators
$\sqrt{n}$-consistency under minimal conditions
Minimax optimality and asymptotic normality in the non-$\sqrt{n}$ regime
Minimax optimality
Slower-than-$\sqrt{n}$ CLT
Simulations
...and 20 more sections

Key Result

Lemma 1

(Structure-agnostic linear expansion) Under Assumptions asmp:dgp and asmp:bdd_density, if $\psi_{ecc}$ is estimated with the DCDR estimator $\widehat{\psi}_n$ from Algorithm alg:dcdr, then $b_\eta \equiv b_\eta(X) = \mathbb{E} \{ \widehat{\eta}(X) - \eta(X) \mid X \}$ is the pointwise bias of the estimator $\widehat{\eta}$, $\rho(\Sigma_n)$ denotes the spectral radius of $\Sigma_n$, and where $\

Figures (7)

Figure 1: QQ-plots of $100$ standardized DCDR estimates with undersmoothed local polynomial regressions and $100$ standardized SCDR-MSE estimates over sample size (columns) with Hölder$(0.35)$ smooth nuisance functions and dimension $1$.
Figure 2: The Doppler function with $N(0, 0.1)$ random noise as in \ref{['eq:doppler']}; this nuisance function was used for Figure \ref{['fig:changing_k']}.
Figure 3: Fold size (x-axis) versus optimal number of neighbors (y-axis), where optimal is in terms of average MSE over 500 datasets; triangles and circles indicate the k-Nearest-Neighbors estimators for $\pi(X)$ and $\mu(X)$, respectively, while diamonds indicate the SCDR estimator for the ECC and squares indicate the DCDR estimator for the ECC.
Figure 4: Example Holder smooth functions (black) of order $s \in \{0.1, 0.35, 0.6\}$ smoothness for $n \in \{100, 1000, 5000 \}$ observed data points (grey) with $N(0, 10)$ random noise.
Figure 5: QQ Plots for the standardized statistics for different dimensions and smoothnesses (columns) and fold sizes (rows). Black circles represent the DCDR known density and smoothness estimator, orange squares represent the DCDR undersmoothed local polynomial regression estimator, and blue triangles represent the SCDR-MSE estimator. The diagonal line is $y = x$.
...and 2 more figures

Theorems & Definitions (95)

Remark 1
Lemma 1
Remark 2
Proposition 1
Remark 3
Lemma 2
Theorem 1
Remark 4
Remark 5
Theorem 2
...and 85 more

Double Cross-fit Doubly Robust Estimators: Beyond Series Regression

TL;DR

Abstract

Double Cross-fit Doubly Robust Estimators: Beyond Series Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (95)