Table of Contents
Fetching ...

Perturbed Double Machine Learning: Nonstandard Inference Beyond the Parametric Length

Mengchu Zheng, Matteo Bonvini, Zijian Guo

TL;DR

The proposal is to inject randomness into the nuisance estimation step to generate perturbed nuisance models, each yielding an estimate of $\beta$ and a Wald interval, and to filter out perturbations whose deviations from the original DML estimate exceed a threshold.

Abstract

We study inference on a low-dimensional functional $β$ in the presence of infinite-dimensional nuisance parameters. Classical inferential methods are typically based on Wald intervals, whose large-sample validity rests on asymptotic negligibility of nuisance error; for example, influence-curve based estimators (Double/Debiased Machine Learning, DML) are asymptotically Gaussian when nuisance estimators converge faster than $n^{-1/4}$. Although such negligibility can hold even in nonparametric classes, it can be restrictive. To relax this requirement, we propose Perturbed Double Machine Learning, which ensures valid inference even when nuisance estimators converge slower than $n^{-1/4}$. Our proposal is to (i) inject randomness into the nuisance estimation step to generate perturbed nuisance models, each yielding an estimate of $β$ and a Wald interval, and (ii) filter out perturbations whose deviations from the original DML estimate exceed a threshold. For Lasso nuisance learners, we show that, with high probability, at least one perturbation yields nuisance estimates sufficiently close to the truth, so the associated estimator of $β$ is close to an oracle with known nuisances. The union of retained intervals delivers valid coverage even when the DML estimator converges slower than $n^{-1/2}$. The framework extends to general machine-learning nuisance learners, and simulations show coverage when state-of-the-art methods fail.

Perturbed Double Machine Learning: Nonstandard Inference Beyond the Parametric Length

TL;DR

The proposal is to inject randomness into the nuisance estimation step to generate perturbed nuisance models, each yielding an estimate of and a Wald interval, and to filter out perturbations whose deviations from the original DML estimate exceed a threshold.

Abstract

We study inference on a low-dimensional functional in the presence of infinite-dimensional nuisance parameters. Classical inferential methods are typically based on Wald intervals, whose large-sample validity rests on asymptotic negligibility of nuisance error; for example, influence-curve based estimators (Double/Debiased Machine Learning, DML) are asymptotically Gaussian when nuisance estimators converge faster than . Although such negligibility can hold even in nonparametric classes, it can be restrictive. To relax this requirement, we propose Perturbed Double Machine Learning, which ensures valid inference even when nuisance estimators converge slower than . Our proposal is to (i) inject randomness into the nuisance estimation step to generate perturbed nuisance models, each yielding an estimate of and a Wald interval, and (ii) filter out perturbations whose deviations from the original DML estimate exceed a threshold. For Lasso nuisance learners, we show that, with high probability, at least one perturbation yields nuisance estimates sufficiently close to the truth, so the associated estimator of is close to an oracle with known nuisances. The union of retained intervals delivers valid coverage even when the DML estimator converges slower than . The framework extends to general machine-learning nuisance learners, and simulations show coverage when state-of-the-art methods fail.

Paper Structure

This paper contains 49 sections, 16 theorems, 291 equations, 16 figures, 2 algorithms.

Key Result

Theorem 1

Suppose Assumption assumption:main_lasso holds and the penalty parameters $\lambda_\eta^{[m]}$ and $\lambda_\gamma^{[m]}$ in eq: lasso optimization problem eta m satisfy $\lambda_\eta^{[m]} = C n^{-1/2}{\rm err}_{n,p}(M;\alpha_0)$ and $\lambda_\gamma^{[m]} = C n^{-1/2}{\rm err}_{n,p}(M;\alpha_0)$ f with the oracle DML estimator $\widehat{\beta}^{\rm ora}$ defined in eq: betaHat ora and ${\rm err}

Figures (16)

  • Figure 1: Workflow of the Perturbed DML procedure.
  • Figure 2: DML with high-dimensional sparse linear models where the sparsity level $s$ ranges from 5 to 100. (A): Absolute empirical bias of the DML estimator. (B): Estimated and empirical standard errors. (C): Empirical coverage of Wald CIs. Results are calculated based on 1000 simulations.
  • Figure 3: Empirical distributions of $\widehat{\beta}^{[m^*]}$ and $\widehat{\beta}$ in Example 1 with $s=150$ and $M=500$, where the dashed curve represents the distribution $\widehat{\beta}^{\rm ora}$, namely $N(\beta,n^{-1}{\rm Var}\{\varphi(O_i;\beta)\}$.
  • Figure 4: Illustration of filtering and aggregation using Example 1 with $n=1000$, $p=500$, $s=100$, $M=100$ and $\pi^*=0.95$. (A): Illustration of 100 perturbed intervals from a single simulation, where the red and blue segments are the Perturbed DML CI in \ref{['eq:filtered union CI']} and the Wald CI in \ref{['eq: wald']}. (B): Boxplots of lower and upper limits of the Perturbed DML CIs and Wald CIs across 500 simulations; boxes indicate the 25th to 75th percentiles. The black dashed line denotes the true parameter $\beta=0.5$.
  • Figure 5: Empirical analysis of proposed CI in Example 1 with $n=1000, p=500, s=120, \pi^*=100\%$ (no filtering) and $M$ ranging from 1 to $10^4$. (A): Maximum of deviations $\{|\widehat{\beta}^{[m]} - \widehat{\beta}|\}_{1\leq m\leq M}$, minimum of distances $\{|\widehat{\beta}^{[m]} - \widehat{\beta}^{\rm ora}|\}_{1\leq m\leq M}$ , the deviation to the oracle estimator $|\widehat{\beta}^{\text{ora}} - \widehat{\beta}|$ and the half of Wald CI width $z_{\alpha/2}\widehat{\rm SE}(\widehat{\beta})$ on $M\leq500$. (B): Zoom out (A) on $M\leq10^4$. (C): Empirical coverages of proposed CIs. (D): Average of CI lengths. The black dashed line marks $M=500$. Results are averaged across 1000 simulations.
  • ...and 11 more figures

Theorems & Definitions (21)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4: Theorem 1 in the paper
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 11 more