Table of Contents
Fetching ...

When is it worthwhile to jackknife? Breaking the quadratic barrier for Z-estimators

Licong Lin, Fangzhou Su, Wenlong Mou, Peng Ding, Martin Wainwright

Abstract

Resampling methods are especially well-suited to inference with estimators that provide only "black-box'' access. Jackknife is a form of resampling, widely used for bias correction and variance estimation, that is well-understood under classical scaling where the sample size $n$ grows for a fixed problem. We study its behavior in application to estimating functionals using high-dimensional $Z$-estimators, allowing both the sample size $n$ and problem dimension $d$ to diverge. We begin showing that the plug-in estimator based on the $Z$-estimate suffers from a quadratic breakdown: while it is $\sqrt{n}$-consistent and asymptotically normal whenever $n \gtrsim d^2$, it fails for a broad class of problems whenever $n \lesssim d^2$. We then show that under suitable regularity conditions, applying a jackknife correction yields an estimate that is $\sqrt{n}$-consistent and asymptotically normal whenever $n\gtrsim d^{3/2}$. This provides strong motivation for the use of jackknife in high-dimensional problems where the dimension is moderate relative to sample size. We illustrate consequences of our general theory for various specific $Z$-estimators, including non-linear functionals in linear models; generalized linear models; and the inverse propensity score weighting (IPW) estimate for the average treatment effect, among others.

When is it worthwhile to jackknife? Breaking the quadratic barrier for Z-estimators

Abstract

Resampling methods are especially well-suited to inference with estimators that provide only "black-box'' access. Jackknife is a form of resampling, widely used for bias correction and variance estimation, that is well-understood under classical scaling where the sample size grows for a fixed problem. We study its behavior in application to estimating functionals using high-dimensional -estimators, allowing both the sample size and problem dimension to diverge. We begin showing that the plug-in estimator based on the -estimate suffers from a quadratic breakdown: while it is -consistent and asymptotically normal whenever , it fails for a broad class of problems whenever . We then show that under suitable regularity conditions, applying a jackknife correction yields an estimate that is -consistent and asymptotically normal whenever . This provides strong motivation for the use of jackknife in high-dimensional problems where the dimension is moderate relative to sample size. We illustrate consequences of our general theory for various specific -estimators, including non-linear functionals in linear models; generalized linear models; and the inverse propensity score weighting (IPW) estimate for the average treatment effect, among others.

Paper Structure

This paper contains 101 sections, 19 theorems, 288 equations, 7 figures.

Key Result

Theorem 1

Given a sample size $n$ satisfying the lower bound eq:linear_approx_Z_est_sample_ahead, and under the conditions ass:smooth_function, ass:tail and ass:convergence, there is a constant $C' = C'(L,\sigma, \rho, \gamma)$ such that with probability at least $1 - \delta$.

Figures (7)

  • Figure 1: Illustration of the quadratic barrier for the plug-in estimator when used for estimating a non-linear functional in linear regression (see \ref{['ExaQuadLinear']} for details) with $n = 400$ samples in dimension ${d} = 20$. Each panel shows histograms of the $\sqrt{n}$-rescaled error for the plug-in estimator (red dashed lines, shaded) and the jackknife corrected-estimator (blue solid line). The standard Gaussian density is shown in black dash-dotted lines for comparison. (a) Results for a well-specified linear model. (b) Results for a mis-specified linear model. In both cases, the plug-in estimator exhibits a large positive bias.
  • Figure 2: Plots of the bias, MSE, coverage, and coverage length for three different estimators ${\tau}({\widehat{\theta}}_n)$, ${\widehat{{\tau}}_{ \mathrm{jac}}}$, and $\widehat{{\tau}}_{\text{unbiased}}$ in \ref{['SecQuadSim']} for linear regression. With sample size $n = 400$, we set the dimension $d = [n^{r}]$ for exponents $r \in [0.1, 0.9]$, and show plots with the exponent $r$ on the horizontal axis. The points (on the curves in each plot) are obtained by taking a Monte Carlo average over $T=1000$ independent trials; error bars denote $\pm1$ of the standard deviation.
  • Figure 3: Plots of the bias, MSE, coverage, and coverage length for three different estimators ${\tau}({\widehat{\theta}}_n)$, ${\widehat{{\tau}}_{ \mathrm{jac}}}$, $\widehat{{\tau}}_{\text{unbiased}}$ in \ref{['SecQuadSim']} when $d = [n^{2/3}]$ and $n = 2^s \times 320$ for $s = 0,\ldots, 5$. The plots are obtained by averaging over $T=1000$ trials and the error bars denote the $\pm1$ standard deviation.
  • Figure 4: Plots of the bias, MSE, coverage, and coverage length for ${\tau}({\widehat{\theta}}_n)$, ${\widehat{{\tau}}_{ \mathrm{jac}}}$ and $\widehat{\tau}_{\text{jeffreys}}$ in \ref{['SecLogSim']} for logistic regression. With fixed sample size $n=400$, we set problem dimension $d=[n^{r}]$ for exponents $r \in [0.1, 0.9]$. The results are obtained by averaging over $T=1000$ trials and the error bars denote the $\pm1$ standard deviation.
  • Figure 5: Plots of the bias, MSE, coverage, and coverage length for for ${\tau}({\widehat{\theta}}_n)$, ${\widehat{{\tau}}_{ \mathrm{jac}}}$ and $\widehat{\tau}_{\text{jeffreys}}$ in \ref{['SecLogSim']} when $d = [n^{2/3}]$ and $n = 2^s \times 320$ for $s = 0,\ldots, 5$. The plots are obtained by averaging over $T = 1000$ trials and the error bars denote the $\pm1$ standard deviation.
  • ...and 2 more figures

Theorems & Definitions (24)

  • Example 1: $Z$-estimators for linear models
  • Example 2: Logistic regression and GLMs
  • Example 3: IPW estimator for the average treatment effect
  • Example 4: Instrumental variables estimate
  • Theorem 1
  • Example 5: Quadratic functionals in linear regression
  • Theorem 2
  • Lemma 1
  • Corollary 1: High-dimensional asymptotics for plug-in
  • Corollary 2: High-dimensional asymptotics for ${\widehat{{\tau}}_{ \mathrm{jac}}}$
  • ...and 14 more