When does Gaussian equivalence fail and how to fix it: Non-universal behavior of random features with quadratic scaling
Garrett G. Wen, Hong Hu, Yue M. Lu, Zhou Fan, Theodor Misiakiewicz
TL;DR
This work identifies a non-universal breakdown of Gaussian equivalence for random features in the quadratic scaling regime, where the target depends on a low-dimensional data projection. It introduces the Conditional Gaussian Equivalent (CGE) model, augmenting the Gaussian surrogate with a small, low-dimensional non-Gaussian component to capture essential chaos that GET misses. The authors prove sharp asymptotics for training and test errors under CGE using a two-phase Lindeberg swapping strategy and Malliavin-Stein-based CLTs, with a intermediary Partial Gaussian Equivalent (PGE) model bridging the gap. They further demonstrate that CGE accurately predicts phenomena such as generalized linear model behavior, phase transitions, interpolation thresholds, double descent, and benign overfitting in RF in the quadratic regime, offering a robust framework beyond GET for high-dimensional ERM universality.
Abstract
A major effort in modern high-dimensional statistics has been devoted to the analysis of linear predictors trained on nonlinear feature embeddings via empirical risk minimization (ERM). Gaussian equivalence theory (GET) has emerged as a powerful universality principle in this context: it states that the behavior of high-dimensional, complex features can be captured by Gaussian surrogates, which are more amenable to analysis. Despite its remarkable successes, numerical experiments show that this equivalence can fail even for simple embeddings -- such as polynomial maps -- under general scaling regimes. We investigate this breakdown in the setting of random feature (RF) models in the quadratic scaling regime, where both the number of features and the sample size grow quadratically with the data dimension. We show that when the target function depends on a low-dimensional projection of the data, such as generalized linear models, GET yields incorrect predictions. To capture the correct asymptotics, we introduce a Conditional Gaussian Equivalent (CGE) model, which can be viewed as appending a low-dimensional non-Gaussian component to an otherwise high-dimensional Gaussian model. This hybrid model retains the tractability of the Gaussian framework and accurately describes RF models in the quadratic scaling regime. We derive sharp asymptotics for the training and test errors in this setting, which continue to agree with numerical simulations even when GET fails. Our analysis combines general results on CLT for Wiener chaos expansions and a careful two-phase Lindeberg swapping argument. Beyond RF models and quadratic scaling, our work hints at a rich landscape of universality phenomena in high-dimensional ERM.
