Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

Haobo Zhang; Yicheng Li; Weihao Lu; Qian Lin

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

Haobo Zhang, Yicheng Li, Weihao Lu, Qian Lin

TL;DR

This work develops a general framework to characterize kernel ridge regression in high-dimensional settings where the sample size scales as $n\asymp d^{\gamma}$ and the target lies in an interpolation space $[\mathcal{H}]^{s}$ with $s>0$. By introducing capacity- and source-condition-dependent quantities $\mathcal{N}_{1}, \mathcal{N}_{2}, \mathcal{M}_{1}, \mathcal{M}_{2}$, the authors derive matching upper and lower bounds for the generalization error with an optimally chosen regularization parameter $\lambda$, and show minimax optimality for $0< s\le 1$ while revealing a saturation effect for $s>1$ in large dimensions. The results exhibit periodic plateau and multiple-descent behavior of the learning curves as $\gamma$ varies and unify prior findings at $s=0$ and $s=1$. Applied to inner-product kernels on the sphere and to neural tangent kernels, the paper provides exact rates for all $s>0$ and clarifies how high-dimensional geometry and smoothness interact with learning, with implications for understanding kernel methods and neural networks in high dimensions. Overall, the work advances precise rate theory for kernel methods in large dimensions and highlights new saturation phenomena beyond classical fixed-dimension theory.

Abstract

Motivated by the studies of neural networks (e.g.,the neural tangent kernel theory), we perform a study on the large-dimensional behavior of kernel ridge regression (KRR) where the sample size $n \asymp d^γ$ for some $γ> 0$. Given an RKHS $\mathcal{H}$ associated with an inner product kernel defined on the sphere $\mathbb{S}^{d}$, we suppose that the true function $f_ρ^{*} \in [\mathcal{H}]^{s}$, the interpolation space of $\mathcal{H}$ with source condition $s>0$. We first determined the exact order (both upper and lower bound) of the generalization error of kernel ridge regression for the optimally chosen regularization parameter $λ$. We then further showed that when $0<s\le1$, KRR is minimax optimal; and when $s>1$, KRR is not minimax optimal (a.k.a. he saturation effect). Our results illustrate that the curves of rate varying along $γ$ exhibit the periodic plateau behavior and the multiple descent behavior and show how the curves evolve with $s>0$. Interestingly, our work provides a unified viewpoint of several recent works on kernel regression in the large-dimensional setting, which correspond to $s=0$ and $s=1$ respectively.

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

TL;DR

This work develops a general framework to characterize kernel ridge regression in high-dimensional settings where the sample size scales as

and the target lies in an interpolation space

with

. By introducing capacity- and source-condition-dependent quantities

, the authors derive matching upper and lower bounds for the generalization error with an optimally chosen regularization parameter

, and show minimax optimality for

while revealing a saturation effect for

in large dimensions. The results exhibit periodic plateau and multiple-descent behavior of the learning curves as

varies and unify prior findings at

and

. Applied to inner-product kernels on the sphere and to neural tangent kernels, the paper provides exact rates for all

and clarifies how high-dimensional geometry and smoothness interact with learning, with implications for understanding kernel methods and neural networks in high dimensions. Overall, the work advances precise rate theory for kernel methods in large dimensions and highlights new saturation phenomena beyond classical fixed-dimension theory.

Abstract

Motivated by the studies of neural networks (e.g.,the neural tangent kernel theory), we perform a study on the large-dimensional behavior of kernel ridge regression (KRR) where the sample size

for some

. Given an RKHS

associated with an inner product kernel defined on the sphere

, we suppose that the true function

, the interpolation space of

with source condition

. We first determined the exact order (both upper and lower bound) of the generalization error of kernel ridge regression for the optimally chosen regularization parameter

. We then further showed that when

, KRR is minimax optimal; and when

, KRR is not minimax optimal (a.k.a. he saturation effect). Our results illustrate that the curves of rate varying along

exhibit the periodic plateau behavior and the multiple descent behavior and show how the curves evolve with

. Interestingly, our work provides a unified viewpoint of several recent works on kernel regression in the large-dimensional setting, which correspond to

and

respectively.

Paper Structure (23 sections, 39 theorems, 222 equations, 3 figures)

This paper contains 23 sections, 39 theorems, 222 equations, 3 figures.

Introduction
Related work
Preliminaries
Integral operator and interpolation space
Main results
KRR's generalization error in the general case
Applications to inner product kernel on the sphere
Applications to neural tangent kernel
Conclusion and discussion
Proof of Theorem \ref{['main theorem']}
Bias-variance decomposition
Variance term
Bias term
Final proof of Theorem \ref{['main theorem']}
Proof of inner product kernel
...and 8 more sections

Key Result

Theorem 1

Let $\mathcal{N}_{1}, \mathcal{N}_{2}, \mathcal{M}_{1}, \mathcal{M}_{2}$ be defined as n1 n2 m1 m2, and let $d=d(n)$ which is allowed to diverge with $n \to \infty$. Suppose that Assumption assumption kernel, assumption noise and assumption eigenfunction hold. Let $\hat{f}_{\lambda}$ be the KRR esti then we have The notation $\Theta_{\mathbb{P}}$ only involves absolute constants.

Figures (3)

Figure 1: Left: The curve of generalization error for estimating $f_{\rho}^{*} \in L^{2}$ (Figure 5 in Ghorbani2019LinearizedTN). Right: The curve of the minimax rates for estimating $f_{\rho}^{*} \in \mathcal{H}$ (Figure 2(b) in lu2023optimal).
Figure 2: Convergence rates of KRR in Theorem \ref{['theorem inner s ge 1']}, Theorem \ref{['theorem inner s le 1']} and corresponding minimax lower rates in Theorem \ref{['theorem lower bound']} (ignoring a $\epsilon$-difference) with respect to dimension $d$. We present 6 graphs corresponding to 6 kinds of source conditions: $s = 0.01, 0.5, 1.0, 1.5, 2.0, 2.5$. The x-axis represents asymptotic scaling, $\gamma: n \asymp d^{\gamma}$; the y-axis represents the convergence rate of generalization error, $r: \text{error} \asymp d^{r}$.
Figure 3: Convergence rates of KRR in Theorem \ref{['theorem inner s ge 1']}, Theorem \ref{['theorem inner s le 1']} and corresponding minimax lower rates in Theorem \ref{['theorem lower bound']} (ignoring a $\epsilon$-difference) with respect to sample size $n$. We present 3 graphs corresponding to 3 kinds of source conditions: $s =0.5, 1.5, 2.5$. The x-axis represents asymptotic scaling, $\gamma: n \asymp d^{\gamma}$; the y-axis represents the convergence rate of generalization error, $r: \text{error} \asymp n^{r}$.

Theorems & Definitions (45)

Theorem 1
Theorem 2: Exact convergence rates when $\mathbf{s \ge 1}$
Theorem 3: Exact convergence rates when $\mathbf{0 < s < 1}$
Remark 4
Theorem 5: Minimax lower bound
Theorem 6: NTK: exact convergence rates when $\mathbf{s \ge 1}$
Theorem 7: NTK: exact convergence rates when $\mathbf{0 < s < 1}$
Theorem 8: NTK: minimax lower bound
Lemma 9
Lemma 10: Approximation B
...and 35 more

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

TL;DR

Abstract

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (45)