Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on Regression Errors
Sungyoon Lee, Sokbae Lee
TL;DR
This work studies the prediction and estimation risk of the ridgeless least squares estimator in overparameterized linear models under general regression-error structures. It derives exact finite-sample variance expressions that separate the dependence on the error covariance $\Omega$ from the design via left-spherical symmetry, showing $\mathbb{E}_X[\mathrm{Var}_\Sigma(\hat{β}\mid X)]=\frac{1}{n}\mathrm{Tr}(\Omega)\mathbb{E}_X[\mathrm{Tr}((X^\top X)^{\dagger}\Sigma)]$ and $\mathbb{E}_X[\mathrm{Var}(\hat{β}\mid X)]=\frac{1}{np}\mathrm{Tr}(\Omega)\mathbb{E}_X[\mathrm{Tr}(\Lambda^{\dagger})]$. The bias components are obtained under random-effects-type assumptions, yielding closed forms for $R_P(\hat{β})$ and $R_E(\hat{β})$, and an asymptotic analysis based on the Stieltjes transform $s^*$ that reveals a double-descent pattern in the estimation risk. The results are supported by numerical experiments with AR$(1)$ and clustered errors and suggest that overparameterization benefits extend to time series, panel, and grouped data. Overall, the paper provides a realistic finite-sample framework for ridgeless interpolation under correlated errors and connects these finite-sample results to high-dimensional asymptotics through $s^*$.
Abstract
In recent years, there has been a significant growth in research focusing on minimum $\ell_2$ norm (ridgeless) interpolation least squares estimators. However, the majority of these analyses have been limited to an unrealistic regression error structure, assuming independent and identically distributed errors with zero mean and common variance. In this paper, we explore prediction risk as well as estimation risk under more general regression error assumptions, highlighting the benefits of overparameterization in a more realistic setting that allows for clustered or serial dependence. Notably, we establish that the estimation difficulties associated with the variance components of both risks can be summarized through the trace of the variance-covariance matrix of the regression errors. Our findings suggest that the benefits of overparameterization can extend to time series, panel and grouped data.
