High-dimensional analysis of ridge regression for non-identically distributed data with a variance profile
Jérémie Bigot, Issa-Mbenard Dabo, Camille Male
TL;DR
This work extends high-dimensional ridge regression analysis to data with non-identically distributed predictors modeled by a variance profile $X_n = \Upsilon_n \circ Z_n$. Using random matrix theory and Dyson-type fixed-point equations, the authors derive deterministic equivalents for the ridge degrees of freedom and for the training and predictive risks, capturing how the ratio $p/n$ and the variance profile shape affect the risk. A key result is that the diagonal of the resolvent $Q_p(z)$ has a deterministic equivalent $T_p(z)$, enabling explicit risk formulas in terms of $T_p(-\lambda)$ and its derivative, and revealing that double descent persists under many variance profiles but can exhibit other shapes (e.g., triple or quadruple descent) for certain profiles. Numerical experiments with synthetic variance profiles and MNIST-based data validate the theory and illustrate the practical impact, offering a tool to analyze ridge regression in heteroscedastic, mixture-like settings and guiding extensions to other estimators and correlated data.
Abstract
High-dimensional linear regression has been thoroughly studied in the context of independent and identically distributed data. We propose to investigate high-dimensional regression models for independent but non-identically distributed data. To this end, we suppose that the set of observed predictors (or features) is a random matrix with a variance profile and with dimensions growing at a proportional rate. Assuming a random effect model, we study the predictive risk of the ridge estimator for linear regression with such a variance profile. In this setting, we provide deterministic equivalents of this risk and of the degree of freedom of the ridge estimator. For certain class of variance profile, our work highlights the emergence of the well-known double descent phenomenon in high-dimensional regression for the minimum norm least-squares estimator when the ridge regularization parameter goes to zero. We also exhibit variance profiles for which the shape of this predictive risk differs from double descent. The proofs of our results are based on tools from random matrix theory in the presence of a variance profile that have not been considered so far to study regression models. Numerical experiments are provided to show the accuracy of the aforementioned deterministic equivalents on the computation of the predictive risk of ridge regression. We also investigate the similarities and differences that exist with the standard setting of independent and identically distributed data.
