On the Saturation Effect of Kernel Ridge Regression

Yicheng Li; Haobo Zhang; Qian Lin

On the Saturation Effect of Kernel Ridge Regression

Yicheng Li, Haobo Zhang, Qian Lin

TL;DR

This work proves a long-standing conjecture about the saturation effect in kernel ridge regression (KRR): when the target function lies in a highly smooth interpolation space $[\mathcal{H}]^{\alpha}$ with $\alpha\ge 2$, the generalization error of KRR cannot decay faster than $n^{-{2}/{(2+\beta)}}$, where $\beta$ characterizes the eigenvalue decay of the kernel. The authors establish this via a bias-variance decomposition, showing $\mathbf{Bias}^2=\Omega(\lambda^2)$ and $\mathbf{Var}=\Omega(\lambda^{-\beta}/n)$, and carefully relating the empirical and population operators to derive the lower bounds. The main result demonstrates a gap between information-theoretic lower bounds and KRR upper bounds for smooth targets, confirming the saturation phenomenon, and is supported by numerical experiments contrasting KRR with gradient flow and other spectral methods. The findings illuminate intrinsic limits of KRR and motivate using non-saturating spectral regularization techniques in settings with very smooth underlying functions.

Abstract

The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-standing conjecture.

On the Saturation Effect of Kernel Ridge Regression

TL;DR

This work proves a long-standing conjecture about the saturation effect in kernel ridge regression (KRR): when the target function lies in a highly smooth interpolation space

with

, the generalization error of KRR cannot decay faster than

, where

characterizes the eigenvalue decay of the kernel. The authors establish this via a bias-variance decomposition, showing

and

, and carefully relating the empirical and population operators to derive the lower bounds. The main result demonstrates a gap between information-theoretic lower bounds and KRR upper bounds for smooth targets, confirming the saturation phenomenon, and is supported by numerical experiments contrasting KRR with gradient flow and other spectral methods. The findings illuminate intrinsic limits of KRR and motivate using non-saturating spectral regularization techniques in settings with very smooth underlying functions.

Abstract

Paper Structure (35 sections, 30 theorems, 187 equations, 2 figures, 3 tables)

This paper contains 35 sections, 30 theorems, 187 equations, 2 figures, 3 tables.

Introduction
Related work
Notation.
Brief review of the saturation effect
Regression over Reproducing kernel Hilbert space
The saturation effect
Main Results
Sketch of the proof
The bias term
The variance term
Numerical Experiments
Conclusion
Basic facts in RKHS
Functions in RKHS
Sample subspace and semi-norm
...and 20 more sections

Key Result

Proposition 2.1

Suppose that $\mathcal{H}$ satisfies the condition cond:EigenDecay and $\mathcal{P}$ consists of all the distributions satisfying the conditions cond:B and cond:C. i) The minimax rate of estimating $f_{\rho}^{*}$ is $n^{-\frac{1}{1+\beta}}$, i.e., we have where $\inf_{\hat{f}}$ is taken over all estimators and both the expectation $\mathbb{E}_\rho$ and the conditional mean $f^*_\rho$ depend on $\

Figures (2)

Figure 1: Error decay curves of KRR and GF. Both axes are logarithmic. The colored curves show the averaged error over 100 trials and the regions within one standard deviation are shown in green. The dashed black lines are computed using logarithmic least-squares and the slopes are reported as convergence rates.
Figure 2: Error decay curves of KRR and GF with kernel $(1-\norm{x-y})_+^4$ on $\mathbb{S}^2$ and $f^* = Y_1^1$.

Theorems & Definitions (56)

Proposition 2.1: Optimality of KRR
Example 2.1: Sobolev RKHS fischer2020_SobolevNorm
Proposition 2.2: Saturation phenomenon of KRR
Theorem 3.1: Saturation effect
Remark 3.2
Lemma A.1
Definition A.2
Lemma A.3
proof
Definition A.4
...and 46 more

On the Saturation Effect of Kernel Ridge Regression

TL;DR

Abstract

On the Saturation Effect of Kernel Ridge Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (56)