Table of Contents
Fetching ...

The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks

James B. Simon, Madeline Dickens, Dhruva Karkada, Michael R. DeWeese

TL;DR

The paper introduces an eigenlearning framework for kernel ridge regression built on a sharp learnability conservation law, showing that the total learnability across a complete basis is bounded by the number of training samples when ridge is zero. By deriving closed-form, modewise estimators—most notably the eigenlearning equations with $\mathcal{L}_i=\lambda_i/(\lambda_i+\kappa)$ and the self-consistent condition $n=\sum_i \lambda_i/(\lambda_i+\kappa)+\delta/\kappa$—the authors express test risk, bias, variance, and related quantities purely in terms of eigenmode learnabilities. The framework yields insights into phenomena such as the deep bootstrap, parity problem hardness for rotation-invariant kernels, and a mean-squared-gradient measure relevant to adversarial robustness, while revealing a deep connection to the free Fermi gas via an explicit expression for $\kappa$ using elementary symmetric polynomials. The approach simplifies prior methods, provides sharp equalities at finite samples, and offers a versatile tool for analyzing generalization and robustness in kernel methods and wide neural networks. Overall, the work delivers interpretable, quantitative predictions across synthetic and real domains and paves the way for applying eigenmode learnability to broader kernel and neural-network contexts.

Abstract

We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression (KRR). Relative to prior work, our derivations are greatly simplified and our final expressions are more readily interpreted. These improvements are enabled by our identification of a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions. Test risk and other objects of interest are expressed transparently in terms of our conserved quantity evaluated in the kernel eigenbasis. We use our improved framework to: i) provide a theoretical explanation for the "deep bootstrap" of Nakkiran et al (2020), ii) generalize a previous result regarding the hardness of the classic parity problem, iii) fashion a theoretical tool for the study of adversarial robustness, and iv) draw a tight analogy between KRR and a well-studied system in statistical physics.

The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks

TL;DR

The paper introduces an eigenlearning framework for kernel ridge regression built on a sharp learnability conservation law, showing that the total learnability across a complete basis is bounded by the number of training samples when ridge is zero. By deriving closed-form, modewise estimators—most notably the eigenlearning equations with and the self-consistent condition —the authors express test risk, bias, variance, and related quantities purely in terms of eigenmode learnabilities. The framework yields insights into phenomena such as the deep bootstrap, parity problem hardness for rotation-invariant kernels, and a mean-squared-gradient measure relevant to adversarial robustness, while revealing a deep connection to the free Fermi gas via an explicit expression for using elementary symmetric polynomials. The approach simplifies prior methods, provides sharp equalities at finite samples, and offers a versatile tool for analyzing generalization and robustness in kernel methods and wide neural networks. Overall, the work delivers interpretable, quantitative predictions across synthetic and real domains and paves the way for applying eigenmode learnability to broader kernel and neural-network contexts.

Abstract

We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression (KRR). Relative to prior work, our derivations are greatly simplified and our final expressions are more readily interpreted. These improvements are enabled by our identification of a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions. Test risk and other objects of interest are expressed transparently in terms of our conserved quantity evaluated in the kernel eigenbasis. We use our improved framework to: i) provide a theoretical explanation for the "deep bootstrap" of Nakkiran et al (2020), ii) generalize a previous result regarding the hardness of the classic parity problem, iii) fashion a theoretical tool for the study of adversarial robustness, and iv) draw a tight analogy between KRR and a well-studied system in statistical physics.

Paper Structure

This paper contains 46 sections, 3 theorems, 75 equations, 11 figures, 3 tables.

Key Result

Proposition 3.1

The following properties of $\mathcal{L}^{(\mathcal{D})}$, $\mathcal{L}$, $\{\phi_i\}$, and any $f$ such that $|\!| f |\!| = 1$ hold:

Figures (11)

  • Figure 1: Toy problem illustrating our conservation law.(A) The task domain: the unit circle discretized into $M=10$ points, $n$ of which comprise the dataset $\mathcal{D}$ (filled circles). (B) The 10 eigenfunctions of a rotation-invariant kernel on this domain, grouped into degenerate pairs and shifted vertically for clarity. (C) We use each eigenfunction $\phi_k$ in turn as the target function. For each $\phi_k$, we compute training targets $\phi_k(\mathcal{D})$, obtain a predicted function $\hat{f}_k$ in a standard supervised learning setup, and subsequently compute $\mathcal{D}$-learnability. This comprises 10 orthogonal learning problems. (D,E) Stacked bar charts with 10 components showing $\mathcal{D}$-learnability for each eigenfunction. The left bar in each pair contains results from NTK regression, while the right bar contains results from wide neural networks. Models vary in activation function and number of hidden layers (HL). Dashed lines indicate $n$. Learnabilities always sum to $n$, exactly for kernel regression and approximately for wide networks.
  • Figure 2: Predicted learnabilities and MSEs closely match experiment.(A-D) Learnability of various eigenfunctions on synthetic domains and binary functions over image datasets. Theoretical predictions from Equation \ref{['eqn:eigenlearning_lrn']} (curves) are plotted against experimental values from trained finite networks (circles) and NTK regression (triangles) with varying dataset size $n$. Error bars show one standard deviation of variation. (E-H) Same as (A-D) for test MSE, with theoretical predictions from Equation \ref{['eqn:eigenlearning_mse']}.
  • Figure 3: We reproduce and explain the deep bootstrap phenomenon in KRR. (A) An experiment illustrating the deep bootstrap effect using a ResNet-18 on CIFAR-10. (B) An analogous experiment using KRR on binarized MNIST. Eigenlearning predictions closely match experimental curves, and $\tau_{\text{eff}} = \kappa_0^{-1}$ (vertical dashed lines) faithfully predicts the transition from regularization-limited to data-limited fitting for each $n$.
  • Figure 4: Predicted function smoothness matches experiment. Predicted MSG of $\hat{f}$ (curves) and empirical MSG for kernel regression (triangles) for $k=1$ modes on hyperspheres with varying dimension.
  • Figure 5: Modewise learnabilities fall on universal sigmoidal curves.(A-F) Predicted learnability curve (sigmoidal curves) and empirical learnabilities for trained networks (circles) and NTK regression (triangles) for eigenmodes $k \in \{0,...,7\}$ on three domains for $n=8,64$. Vertical dashed lines indicate $\kappa$. (G) All data from (A-F) with eigenvalues rescaled by $\kappa$.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • Theorem 3.2: Conservation of learnability
  • Lemma I.1