Table of Contents
Fetching ...

Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting

Jiping Li, Rishi Sonthalia

TL;DR

The paper addresses generalization in high-dimensional linear regression under a spiked covariance model, focusing on minimum-norm interpolants and how spike strength and target-spike alignment shape generalization. It develops an exact risk decomposition into $Bias$, $Variance$, $Data oise$, and $Target ext{ Alignment}$ terms and uses this to classify regimes as benign, tempered, or catastrophic as the dimension parameter grows and spike scales vary. A key finding is that alignment with the spike is not universally beneficial: in well-specified aligned problems, increasing spike strength can drive transitions to catastrophic overfitting before benign overfitting appears, while misspecification and covariate shift can worsen or alter these transitions. The results extend beyond linear models, with nonlinear experiments (e.g., 3-layer ReLU nets) exhibiting similar alignment phase transitions, suggesting broad relevance for generalization in anisotropic data. Overall, the work provides a detailed map of how spectral structure and target alignment govern generalization in overparameterized regimes, challenging naive isotropic intuitions and informing model selection under spectral heterogeneity.

Abstract

This paper analyzes the generalization error of minimum-norm interpolating solutions in linear regression using spiked covariance data models. The paper characterizes how varying spike strengths and target-spike alignments can affect risk, especially in overparameterized settings. The study presents an exact expression for the generalization error, leading to a comprehensive classification of benign, tempered, and catastrophic overfitting regimes based on spike strength, the aspect ratio $c=d/n$ (particularly as $c \to \infty$), and target alignment. Notably, in well-specified aligned problems, increasing spike strength can surprisingly induce catastrophic overfitting before achieving benign overfitting. The paper also reveals that target-spike alignment is not always advantageous, identifying specific, sometimes counterintuitive, conditions for its benefit or detriment. Alignment with the spike being detrimental is empirically demonstrated to persist in nonlinear models.

Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting

TL;DR

The paper addresses generalization in high-dimensional linear regression under a spiked covariance model, focusing on minimum-norm interpolants and how spike strength and target-spike alignment shape generalization. It develops an exact risk decomposition into , , , and terms and uses this to classify regimes as benign, tempered, or catastrophic as the dimension parameter grows and spike scales vary. A key finding is that alignment with the spike is not universally beneficial: in well-specified aligned problems, increasing spike strength can drive transitions to catastrophic overfitting before benign overfitting appears, while misspecification and covariate shift can worsen or alter these transitions. The results extend beyond linear models, with nonlinear experiments (e.g., 3-layer ReLU nets) exhibiting similar alignment phase transitions, suggesting broad relevance for generalization in anisotropic data. Overall, the work provides a detailed map of how spectral structure and target alignment govern generalization in overparameterized regimes, challenging naive isotropic intuitions and informing model selection under spectral heterogeneity.

Abstract

This paper analyzes the generalization error of minimum-norm interpolating solutions in linear regression using spiked covariance data models. The paper characterizes how varying spike strengths and target-spike alignments can affect risk, especially in overparameterized settings. The study presents an exact expression for the generalization error, leading to a comprehensive classification of benign, tempered, and catastrophic overfitting regimes based on spike strength, the aspect ratio (particularly as ), and target alignment. Notably, in well-specified aligned problems, increasing spike strength can surprisingly induce catastrophic overfitting before achieving benign overfitting. The paper also reveals that target-spike alignment is not always advantageous, identifying specific, sometimes counterintuitive, conditions for its benefit or detriment. Alignment with the spike being detrimental is empirically demonstrated to persist in nonlinear models.

Paper Structure

This paper contains 72 sections, 49 theorems, 487 equations, 4 figures, 3 tables.

Key Result

Theorem 1

Given data $({\bm{X}},{\bm{y}})$ and $(\tilde{{\bm{X}}}, \tilde{{\bm{y}}})$ generated according to Assumptions assumption:Z (Signal), assumption:A (Noise), Equation eq:target-model (Target Model), and Assumption assumption:scaling (Scaling). If the well-specification condition $\alpha_Z = \alpha_A = where ${\bm{u}}$ is the unit vector defining the spike direction.

Figures (4)

  • Figure 1: Excess error vs. overparameterization ratio $c = d/n$ in the well-specified case. Each plot shows the risk for aligned and anti-aligned targets under different spike scaling regimes. The scatter plots are empirically obtained and the lines are theory.
  • Figure 2: Transition from beneficial to harmful alignment under mild misspecification. The scatter plots are empirically obtained and the lines are theory.
  • Figure 3: Phase boundaries for spike alignment impact. Coefficient of $({\bm{\beta}}_*^\top {\bm{u}})^2$ as a function of $\alpha_Z/\alpha_A$, indicating whether alignment improves or harms generalization.
  • Figure 4: Alignment-phase transitions persist in deep networks. Generalization error vs. angle between spike direction ${\bm{u}}$ and ground-truth parameter ${\bm{\beta}}_*$ when fitting data with a 3-layer ReLU networks. The effect of alignment switches as $\alpha_Z$ increases, consistent with the phase transitions predicted by our theory. Experimental details are in Appendix\ref{['app:exp']}.

Theorems & Definitions (95)

  • Remark 1: Generalizing Prior Work
  • Theorem 1: Well-Specified Risk
  • Remark 2
  • Theorem 2: Misspecified
  • Theorem 3
  • Theorem 4
  • Theorem 5: Generalization Risk
  • Theorem 5: Generalization Risk
  • Theorem 6: Theorems 3, 5 of meyer
  • Proposition 1: Proposition 2 from sonthalia2023training
  • ...and 85 more