Inference for Projection Parameters in Linear Regression: beyond $d = o(n^{1/2})$
Woonyoung Chang, Arun Kumar Kuchibhotla, Alessandro Rinaldo
TL;DR
This work tackles inference for projection parameters in high-dimensional, possibly misspecified linear regression by developing a bias-corrected LS estimator that remains $ oot n$-consistent when $d=o(n^{2/3})$ (up to polylog factors). It provides explicit finite-sample Berry–Esseen bounds for both unnormalized and studentized linear contrasts, enabling accurate distributional approximations without relying on variance estimation. The authors introduce three inference methods—HulC, $t$-statistic based, and bootstrap—that yield valid confidence regions under minimal moment assumptions and without requiring $d$ to scale as $o( oot n 2)$. They also establish consistency results for the sandwich variance estimator and discuss extensions to push the dimension range further via higher-order $U$-statistics. The numerical studies illustrate the practical performance of the proposed approaches in both well-specified and misspecified settings, highlighting robustness and competitive coverage across a range of $(n,d)$ pairs.
Abstract
We consider the problem of inference for projection parameters in linear regression with increasing dimensions. This problem has been studied under a variety of assumptions in the literature. The classical asymptotic normality result for the least squares estimator of the projection parameter only holds when the dimension $d$ of the covariates is of a smaller order than $n^{1/2}$, where $n$ is the sample size. Traditional sandwich estimator-based Wald intervals are asymptotically valid in this regime. In this work, we propose a bias correction for the least squares estimator and prove the asymptotic normality of the resulting debiased estimator. Precisely, we provide an explicit finite sample Berry Esseen bound on the Normal approximation to the law of the linear contrasts of the proposed estimator normalized by the sandwich standard error estimate. Our bound, under only finite moment conditions on covariates and errors, tends to 0 as long as $d = o(n^{2/3})$ up to the polylogarithmic factors. Furthermore, we leverage recent methods of statistical inference that do not require an estimator of the variance to perform asymptotically valid statistical inference and that leads to a sharper miscoverage control compared to Wald's. We provide a discussion of how our techniques can be generalized to increase the allowable range of $d$ even further.
