The distribution of Ridgeless least squares interpolators
Qiyang Han, Xiaocong Xu
TL;DR
The paper delivers a comprehensive high-dimensional distributional theory for the Ridgeless interpolator in overparametrized linear models by linking it to a Ridge estimator in an associated Gaussian sequence model with effective noise and regularization solved through fixed-point equations. It establishes uniform distributional characterizations for all $\ell_q$-type risks, unveils the implicit regularization mechanism via the positive implicit regularization parameter $\tau_{\eta,\ast}$, and proves universality across Gaussian and non-Gaussian designs. The work further shows that cross-validation methods (GCV and $k$-fold CV) asymptotically optimize not only prediction risk but also estimation and in-sample risks, enabling debiased inference with short confidence intervals. The results are underpinned by mean-field arguments, CGMT-based proof strategies, and a rigorous treatment of fixed-point equations with both population and sample versions. Collectively, the findings provide a precise, distributional understanding of Ridgeless interpolation, illuminate cross-validation’s broader utility, and offer principled tools for inference in highly overparameterized regimes.
Abstract
The Ridgeless minimum $\ell_2$-norm interpolator in overparametrized linear regression has attracted considerable attention in recent years in both machine learning and statistics communities. While it seems to defy conventional wisdom that overfitting leads to poor prediction, recent theoretical research on its $\ell_2$-type risks reveals that its norm minimizing property induces an `implicit regularization' that helps prediction in spite of interpolation. This paper takes a further step that aims at understanding its precise stochastic behavior as a statistical estimator. Specifically, we characterize the distribution of the Ridgeless interpolator in high dimensions, in terms of a Ridge estimator in an associated Gaussian sequence model with positive regularization, which provides a precise quantification of the prescribed implicit regularization in the most general distributional sense. Our distributional characterizations hold for general non-Gaussian random designs and extend uniformly to positively regularized Ridge estimators. As a direct application, we obtain a complete characterization for a general class of weighted $\ell_q$ risks of the Ridge(less) estimators that are previously only known for $q=2$ by random matrix methods. These weighted $\ell_q$ risks not only include the standard prediction and estimation errors, but also include the non-standard covariate shift settings. Our uniform characterizations further reveal a surprising feature of the commonly used generalized and $k$-fold cross-validation schemes: tuning the estimated $\ell_2$ prediction risk by these methods alone lead to simultaneous optimal $\ell_2$ in-sample, prediction and estimation risks, as well as the optimal length of debiased confidence intervals.
