Demystifying Spectral Bias on Real-World Data
Itay Lavie, Zohar Ringel
TL;DR
The paper tackles spectral bias in kernel ridge regression and Gaussian processes by introducing cross-dataset learnability, which uses an auxiliary, symmetry-respecting measure $q$ to bound learnability on real data without solving the intractable eigenproblem on the target distribution. It derives a tight, practical bound that depends on kernel eigenvalues/eigenfunctions under $q$ and the target’s projection onto these eigenfunctions, plus corollaries linking to desired sample complexity and covariate-shift performance. The authors provide theoretical guarantees, a universal bound (and its corollary lower bound on sample complexity), and empirical validation on real datasets (CIFAR-10, Fashion-MNIST, MNIST) as well as illustrative vignettes with linear regression on manifolds and Transformer copying-head tasks. The approach leverages kernel symmetries via representation theory to transfer favorable spectral properties from the idealized measure $q$ to real data, offering a principled way to anticipate spectral bias and sample complexity, with potential extensions to ridgeless regression and broader architectures.
Abstract
Kernel ridge regression (KRR) and Gaussian processes (GPs) are fundamental tools in statistics and machine learning, with recent applications to highly over-parameterized deep neural networks. The ability of these tools to learn a target function is directly related to the eigenvalues of their kernel sampled on the input data distribution. Targets that have support on higher eigenvalues are more learnable. However, solving such eigenvalue problems on real-world data remains a challenge. Here, we consider cross-dataset learnability and show that one may use eigenvalues and eigenfunctions associated with highly idealized data measures to reveal spectral bias on complex datasets and bound learnability on real-world data. This allows us to leverage various symmetries that realistic kernels manifest to unravel their spectral bias.
