Table of Contents
Fetching ...

Bridging Root-$n$ and Non-standard Asymptotics: Adaptive Inference in M-Estimation

Kenta Takatsu, Arun Kumar Kuchibhotla

TL;DR

This work develops a universal, split-sample framework for honest confidence sets in M-estimation, addressing both regular and irregular (non-standard) asymptotics and enabling dimension-agnostic inference. It constructs lower confidence bounds via two complementary routes—concentration-inequality bounds and CLT-based bounds—while controlling the diameter of the resulting CI through curvature, entropy, and initial-estimator quality. The authors demonstrate the method across high-dimensional mean estimation, misspecified linear regression, Manski’s discrete choice, quantile estimation without positive densities, and discrete argmin inference, showing adaptivity of width to problem geometry and local regularity. A numerical study confirms robust finite-sample coverage in high dimensions and compares the proposed sets to Wald intervals, highlighting the practical trade-off between validity and width. The work also situates itself in the historical development of honest, adaptive inference and outlines several promising extensions to constrained problems and more complex models.

Abstract

This manuscript studies a general approach to construct confidence sets for the solution of population-level optimization, commonly referred to as M-estimation. Statistical inference for M-estimation poses significant challenges due to the non-standard limiting behaviors of the corresponding estimator, which arise in settings with increasing dimension of parameters, non-smooth objectives, or constraints. We propose a simple and unified method that guarantees validity in both regular and irregular cases. Moreover, we provide a comprehensive width analysis of the proposed confidence set, showing that the convergence rate of the diameter is adaptive to the unknown degree of instance-specific regularity. We apply the proposed method to several high-dimensional and irregular statistical problems.

Bridging Root-$n$ and Non-standard Asymptotics: Adaptive Inference in M-Estimation

TL;DR

This work develops a universal, split-sample framework for honest confidence sets in M-estimation, addressing both regular and irregular (non-standard) asymptotics and enabling dimension-agnostic inference. It constructs lower confidence bounds via two complementary routes—concentration-inequality bounds and CLT-based bounds—while controlling the diameter of the resulting CI through curvature, entropy, and initial-estimator quality. The authors demonstrate the method across high-dimensional mean estimation, misspecified linear regression, Manski’s discrete choice, quantile estimation without positive densities, and discrete argmin inference, showing adaptivity of width to problem geometry and local regularity. A numerical study confirms robust finite-sample coverage in high dimensions and compares the proposed sets to Wald intervals, highlighting the practical trade-off between validity and width. The work also situates itself in the historical development of honest, adaptive inference and outlines several promising extensions to constrained problems and more complex models.

Abstract

This manuscript studies a general approach to construct confidence sets for the solution of population-level optimization, commonly referred to as M-estimation. Statistical inference for M-estimation poses significant challenges due to the non-standard limiting behaviors of the corresponding estimator, which arise in settings with increasing dimension of parameters, non-smooth objectives, or constraints. We propose a simple and unified method that guarantees validity in both regular and irregular cases. Moreover, we provide a comprehensive width analysis of the proposed confidence set, showing that the convergence rate of the diameter is adaptive to the unknown degree of instance-specific regularity. We apply the proposed method to several high-dimensional and irregular statistical problems.
Paper Structure (31 sections, 25 theorems, 302 equations, 3 figures)

This paper contains 31 sections, 25 theorems, 302 equations, 3 figures.

Key Result

Theorem 1

For any initial estimator $\widehat{\theta}_1$ computed on $D_1$, and any estimator $\widehat{\mathbb{M}}_n(\cdot)$ of $\mathbb{M}(\cdot, P)$ computed on $D_2$, we have In particular, if $\mathbb{V}_P(\theta(P), \widehat{\theta}_1)/\mathbb{C}_P^2(\widehat{\theta}_1) = o_p(1)$ uniformly over all $P\in\mathcal{P}$, then $\widehat{\mathrm{CI}}_n^{\dagger}$ is an asymptotically uniformly valid confid

Figures (3)

  • Figure 1: Comparison of the empirical coverages of the $95\%$ confidence sets; Asymptotic, Sample split and Sample split + Upper bound. The empirical coverages are computed from $1000$ replications. The figure displays the poor coverage accuracy of the Wald interval in high-dimensional settings, with coverage dropping as low as $50\%$ in the analyses. In contrast, the proposed sample-splitting procedures maintain robust validity as the dimension increases. The results further highlight the empirical conservativeness of the standard method, despite its theoretical optimality as claimed by Theorem \ref{['thm:mean-ci-width']}. We show that the small modification to the confidence set restores the nominal coverage.
  • Figure 2: The average ratio between the geometric means of semi-axes associated with the proposed method and the Wald intervals. The average is computed from $1000$ replications. The results roughly state that the sample-splitting procedure enlarges the Wald interval in each univariate projection by at most a factor of $2$. When $d=2$, the ratio is approximately $1.9$. We emphasize that Figure \ref{['fig:coverage']} shows that the Wald interval becomes anti-conservative in high-dimensions, with a miscoverage rate approaching $50\%$. The observed enlargement by a constant factor is expected, as the proposed procedure employs sample-splitting.
  • Figure A.1: An illustration of confidence sets for the bivariate mean $\theta = (\theta_1, \theta_2)^\top$, where the true parameter corresponds to $\theta(P) = (0, 0)^\top$. Three confidence sets are shown at confidence levels of $95\%$, $85\%$, and $75\%$. The confidence set based on the asymptotic distribution (left) yields an elliptical region while the proposed confidence sets (middle and right) are non-convex.

Theorems & Definitions (53)

  • Theorem 1
  • Theorem 2
  • Remark 1
  • Remark 2
  • Example 1: Empirical Bernstein inequality
  • Theorem 3
  • Remark 3
  • Theorem 4
  • Theorem 5
  • Lemma 6
  • ...and 43 more