Table of Contents
Fetching ...

Generalization for Least Squares Regression With Simple Spiked Covariances

Jiping Li, Rishi Sonthalia

TL;DR

This paper examines two simple models exhibiting spiked covariances and derives their generalization error in the asymptotic proportional regime, demonstrating that the eigenvector and eigenvalue corresponding to the spike significantly influence the generalization error.

Abstract

Random matrix theory has proven to be a valuable tool in analyzing the generalization of linear models. However, the generalization properties of even two-layer neural networks trained by gradient descent remain poorly understood. To understand the generalization performance of such networks, it is crucial to characterize the spectrum of the feature matrix at the hidden layer. Recent work has made progress in this direction by describing the spectrum after a single gradient step, revealing a spiked covariance structure. Yet, the generalization error for linear models with spiked covariances has not been previously determined. This paper addresses this gap by examining two simple models exhibiting spiked covariances. We derive their generalization error in the asymptotic proportional regime. Our analysis demonstrates that the eigenvector and eigenvalue corresponding to the spike significantly influence the generalization error.

Generalization for Least Squares Regression With Simple Spiked Covariances

TL;DR

This paper examines two simple models exhibiting spiked covariances and derives their generalization error in the asymptotic proportional regime, demonstrating that the eigenvector and eigenvalue corresponding to the spike significantly influence the generalization error.

Abstract

Random matrix theory has proven to be a valuable tool in analyzing the generalization of linear models. However, the generalization properties of even two-layer neural networks trained by gradient descent remain poorly understood. To understand the generalization performance of such networks, it is crucial to characterize the spectrum of the feature matrix at the hidden layer. Recent work has made progress in this direction by describing the spectrum after a single gradient step, revealing a spiked covariance structure. Yet, the generalization error for linear models with spiked covariances has not been previously determined. This paper addresses this gap by examining two simple models exhibiting spiked covariances. We derive their generalization error in the asymptotic proportional regime. Our analysis demonstrates that the eigenvector and eigenvalue corresponding to the spike significantly influence the generalization error.

Paper Structure

This paper contains 32 sections, 32 theorems, 194 equations, 3 figures.

Key Result

Theorem 1

Let $\{(n_k, d_k)\}_{k \in \mathbb{N}}$ be a sequence of pairs of integers such that $d_k/n_k \to c$ as $k \to \infty$. Suppose $\Sigma(d_k)$ and $X_k \in \mathbb{R}^{n_k \times d_k}$ has $n_k$ I.I.D. samples from $\mathcal{N}(0, \Sigma(d_k))$. If $\nu_{\Sigma}$ converges almost surely to $\nu_{H}$,

Figures (3)

  • Figure 1.1: Figure from moniri2023theory showing the singular values of $F_0 + P$. The bulk corresponds to $F_0$, while the spikes represent the effect of $P$.
  • Figure 4.1: Figure showing the affect of the spike on the generalization error for finite matrices. Left: when the strength of the spike is large compared to the bulk, we see an affect that this is not detected by asymptotic risk. Right: the bulk and the spike have the same strength and we do not see any effects of the spike on the risk.
  • Figure 4.2: The peak for generalization error versus $c$ curve has a peak at $c = \frac{\tau^2_{A_{trn}}}{\tau^2_{A_{trn}}+\mu^2}$. For both figures $\mu=\tau_{\varepsilon_{trn}}=\theta_{trn}=\theta_{tst}=1$ and $d=1000$. Left: We set $\tau_{A_{trn}} = 1$, hence the peak should occur at $c = 1/2$. Right: We set $\tau_{A_{trn}} = 2$, hence the peak should occur at $c=4/5$.

Theorems & Definitions (61)

  • Definition 1: Empirical Spectral Distribution (e.s.d.
  • Definition 2: Stieltjes Transform
  • Theorem 1: Marenko1967DISTRIBUTIONOE
  • Example 1
  • Theorem 2: baik2006eigenvalues Theorem 1.1
  • Theorem 3: Risk for Signal Plus Noise Problem
  • Theorem 4: Risk for Signal Only Problem
  • Corollary 1: Non-Regularized Error
  • Lemma 1
  • proof
  • ...and 51 more