Table of Contents
Fetching ...

Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum

Tin Sum Cheng, Aurelien Lucchi, Anastasis Kratsios, David Belius

TL;DR

The paper addresses overfitting in kernel ridge regression (KRR) with fixed input dimension by linking generalization to the kernel eigen-spectrum. It develops non-asymptotic test-error bounds under a sub-Gaussian design and derives bounds on the kernel matrix condition number, showing that the eigen-decay rate drives overfitting: polynomial decay yields tempered overfitting, while exponential decay yields catastrophic overfitting. It extends the analysis to dependent features and demonstrates the crucial role of feature independence, contrasting with Gaussian-design universality. The results include tight upper and matching lower bounds, and they are complemented by experiments and a finite-rank kernel approximation discussion, with implications for understanding benign overfitting and kernel design in finite regimes.

Abstract

We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is two-fold: (i) we rigorously prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption, closing an existing gap in the literature; (ii) we identify that the independence of the features plays an important role in guaranteeing tempered overfitting, raising concerns about approximating KRR generalization using the Gaussian design assumption in previous literature.

Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum

TL;DR

The paper addresses overfitting in kernel ridge regression (KRR) with fixed input dimension by linking generalization to the kernel eigen-spectrum. It develops non-asymptotic test-error bounds under a sub-Gaussian design and derives bounds on the kernel matrix condition number, showing that the eigen-decay rate drives overfitting: polynomial decay yields tempered overfitting, while exponential decay yields catastrophic overfitting. It extends the analysis to dependent features and demonstrates the crucial role of feature independence, contrasting with Gaussian-design universality. The results include tight upper and matching lower bounds, and they are complemented by experiments and a finite-rank kernel approximation discussion, with implications for understanding benign overfitting and kernel design in finite regimes.

Abstract

We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is two-fold: (i) we rigorously prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption, closing an existing gap in the literature; (ii) we identify that the independence of the features plays an important role in guaranteeing tempered overfitting, raising concerns about approximating KRR generalization using the Gaussian design assumption in previous literature.
Paper Structure (32 sections, 26 theorems, 83 equations, 7 figures, 1 table)

This paper contains 32 sections, 26 theorems, 83 equations, 7 figures, 1 table.

Key Result

Theorem 4.1

Suppose $M,N\in\mathbb{N}$ such that $M\geq \eta N$ for some constant $\eta>1$ that is large enough. Let $\Psi\in\mathbb{R}^{M\times N}$ be a matrix with i.i.d. isotropic random vectors $\Psi_i$'s with independent sub-Gaussian entries as columns. Let $\bm{\Lambda}=\mathop{\mathrm{diag}}\nolimits(\la

Figures (7)

  • Figure 1: Kernel spectra for Laplacian and Gaussian kernels and their overfitting behaviours. Tempered Overfitting: The empirical kernel spectrum of the Laplacian kernel decays moderately (top left), and so does the quality of its test-set performance as one departs from the training data (top right). Catastrophic Overfitting: The Gaussian kernel exhibits rapid spectral decay (bottom left), and so does the reliability of its test-set performance for inputs far from the training data (bottom right).
  • Figure 2: Test error of kernel interpolation on the unit 2-disk against the sample size $N$. (left): Laplacian kernel $K(x,z)=e^{-\left\|x-z\right\|_{2}}$ (right): ReLU Neural tangent kernel (NTK) for a 1-hidden layer network
  • Figure 3: Validation of Theorem \ref{['theorem:main:1']}: The ratios $\frac{s_{\max}}{s_{\min}} : \frac{\lambda_1}{\lambda_N}$ for the polynomial spectrum (left) and $\frac{s_{\max}}{s_{\min}} : \frac{N\lambda_1}{\lambda_N}$ for the exponential spectrum (right) are asymptotically constant.
  • Figure 4: Validation of Theorems \ref{['theorem:main:2']} and \ref{['theorem:main:3']}: Learning curves for spectra with polynomial (left) and exponential (right) decays.
  • Figure 5: Empirical singular values are close to the eigen-spectrum for independent features (top): The features are Gaussian $\psi\sim\mathcal{N}(0,\mathbf{I}_M)$. (bottom): The features are uniformly distributed $\psi\sim(\mathop{\mathrm{unif}}\nolimits[-\sqrt{3},+\sqrt{3}])^p$.
  • ...and 2 more figures

Theorems & Definitions (42)

  • Remark 3.2
  • Theorem 4.1: Bounding the Condition Number
  • Theorem 4.2: Overfitting with Polynomial and Exponential Eigen-Decay
  • Theorem 4.3: Smallest singular value with dependent features
  • Lemma A.2: bound on largest singular value, Theorem 9 in koltchinskii2017concentration, Theorem 1 in zhivotovskiy2024dimension
  • Remark A.3: Sub-Gaussian Condition
  • Lemma A.4: Lower bound of smallest singular value for polynomial spectrum
  • proof
  • Remark A.5: Dependence of features
  • Lemma A.6: Condition number for polynomial eigen-decay
  • ...and 32 more