Table of Contents
Fetching ...

It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

Jikai Jin, Lester Mackey, Vasilis Syrgkanis

TL;DR

This work investigates structure-agnostic estimation (SAE) for causal inference in a partially linear model and reveals a sharp dependence on the treatment-noise distribution. It proves that with Gaussian treatment noise, the doubly robust DML estimator is minimax rate-optimal, establishing a fundamental barrier to improvements from distributional information. By contrast, when treatment noise is non-Gaussian, the authors develop ACE, a cumulant-based, higher-order orthogonal estimation framework that achieves arbitrarily high-order insensitivity to nuisance errors and improved rates of convergence, under independence between noise and covariates. Synthetic experiments demonstrate the practical gains of ACE in demand estimation tasks, while theoretical results provide minimax lower/upper bounds and identifiability conditions guiding when and how much improvement is possible. Overall, the paper clarifies when distributional properties can be exploited to surpass DML within SAE and lays a foundation for high-order robust estimators in causal inference.

Abstract

Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of \citet{robinson1988root}, we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of \citet{mackey2018orthogonal}. Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These \emph{ACE} procedures use structure-agnostic cumulant estimators to achieve $r$-th order insensitivity to nuisance errors whenever the $(r+1)$-st treatment cumulant is non-zero. We complement these core results with novel minimax guarantees for binary treatments in the partially linear model. Finally, using synthetic demand estimation experiments, we demonstrate the practical benefits of our higher-order robust estimators.

It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

TL;DR

This work investigates structure-agnostic estimation (SAE) for causal inference in a partially linear model and reveals a sharp dependence on the treatment-noise distribution. It proves that with Gaussian treatment noise, the doubly robust DML estimator is minimax rate-optimal, establishing a fundamental barrier to improvements from distributional information. By contrast, when treatment noise is non-Gaussian, the authors develop ACE, a cumulant-based, higher-order orthogonal estimation framework that achieves arbitrarily high-order insensitivity to nuisance errors and improved rates of convergence, under independence between noise and covariates. Synthetic experiments demonstrate the practical gains of ACE in demand estimation tasks, while theoretical results provide minimax lower/upper bounds and identifiability conditions guiding when and how much improvement is possible. Overall, the paper clarifies when distributional properties can be exploited to surpass DML within SAE and lays a foundation for high-order robust estimators in causal inference.

Abstract

Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of \citet{robinson1988root}, we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of \citet{mackey2018orthogonal}. Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These \emph{ACE} procedures use structure-agnostic cumulant estimators to achieve -th order insensitivity to nuisance errors whenever the -st treatment cumulant is non-zero. We complement these core results with novel minimax guarantees for binary treatments in the partially linear model. Finally, using synthetic demand estimation experiments, we demonstrate the practical benefits of our higher-order robust estimators.

Paper Structure

This paper contains 50 sections, 65 theorems, 347 equations, 3 figures, 1 algorithm.

Key Result

Theorem 3.1

Fix any $C_{\uptheta}>0$, $c_q,\delta\in(0,\frac{1}{4})$, and $K\in\mathbb{N}_+$, and let If $\left\| {\epsilon} \right\|_{\infty}\leq \delta/2$, then for any estimates $\hat{h}=(\hat{g},\hat{q})$ with $c_q \leq \hat{q}(X)\leq 1-c_q$ and $\hat{g}(X)(1-\hat{g}(X))\geq A^{-1}\delta, a.s.$, we have for any $\gamma\in(1/2,1)$, where $c_\gamma$ is a universal constant that only depends on $\gamma$.

Figures (3)

  • Figure 1: Comparison of first through fifth-order ACE estimation (\ref{['alg:hocein']}) in the synthetic demand estimation setting of \ref{['sec:experiments']}. Fourth-order ACE is omitted due to substantially larger error. All quality measures and shaded 95% confidence bands are estimated using $20000$ independent replicates of the experiment.
  • Figure 2: The sensitivity of ACE estimators to correlation of the covariate $X$ and the noise variable $\eta$.
  • Figure 3: Experiment results for ACE estimators with fixed sample size $n=10000$ and varying sparsity.

Theorems & Definitions (80)

  • Definition 2.1: Data generating distributions, target parameters, and nuisance functions
  • Definition 2.2: Uncertainty sets
  • Definition 2.3: Minimax estimation error
  • Definition 3.1: Set of feasible distributions
  • Theorem 3.1: Structure-agnostic lower bound for binary treatment
  • Theorem 3.2: The Gaussian treatment barrier
  • Lemma 4.1: Explicit formula for $J_r$
  • Theorem 4.1: Structure-agnostic error from estimated moments
  • Theorem 5.1: Efficient cumulant estimator for noise with finite moments
  • Theorem 5.2: Efficient cumulant estimator for sub-Gaussian noise
  • ...and 70 more