It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation
Jikai Jin, Lester Mackey, Vasilis Syrgkanis
TL;DR
This work investigates structure-agnostic estimation (SAE) for causal inference in a partially linear model and reveals a sharp dependence on the treatment-noise distribution. It proves that with Gaussian treatment noise, the doubly robust DML estimator is minimax rate-optimal, establishing a fundamental barrier to improvements from distributional information. By contrast, when treatment noise is non-Gaussian, the authors develop ACE, a cumulant-based, higher-order orthogonal estimation framework that achieves arbitrarily high-order insensitivity to nuisance errors and improved rates of convergence, under independence between noise and covariates. Synthetic experiments demonstrate the practical gains of ACE in demand estimation tasks, while theoretical results provide minimax lower/upper bounds and identifiability conditions guiding when and how much improvement is possible. Overall, the paper clarifies when distributional properties can be exploited to surpass DML within SAE and lays a foundation for high-order robust estimators in causal inference.
Abstract
Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of \citet{robinson1988root}, we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of \citet{mackey2018orthogonal}. Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These \emph{ACE} procedures use structure-agnostic cumulant estimators to achieve $r$-th order insensitivity to nuisance errors whenever the $(r+1)$-st treatment cumulant is non-zero. We complement these core results with novel minimax guarantees for binary treatments in the partially linear model. Finally, using synthetic demand estimation experiments, we demonstrate the practical benefits of our higher-order robust estimators.
