Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum
Tin Sum Cheng, Aurelien Lucchi, Anastasis Kratsios, David Belius
TL;DR
The paper addresses overfitting in kernel ridge regression (KRR) with fixed input dimension by linking generalization to the kernel eigen-spectrum. It develops non-asymptotic test-error bounds under a sub-Gaussian design and derives bounds on the kernel matrix condition number, showing that the eigen-decay rate drives overfitting: polynomial decay yields tempered overfitting, while exponential decay yields catastrophic overfitting. It extends the analysis to dependent features and demonstrates the crucial role of feature independence, contrasting with Gaussian-design universality. The results include tight upper and matching lower bounds, and they are complemented by experiments and a finite-rank kernel approximation discussion, with implications for understanding benign overfitting and kernel design in finite regimes.
Abstract
We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is two-fold: (i) we rigorously prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption, closing an existing gap in the literature; (ii) we identify that the independence of the features plays an important role in guaranteeing tempered overfitting, raising concerns about approximating KRR generalization using the Gaussian design assumption in previous literature.
