Table of Contents
Fetching ...

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

Haobo Zhang, Yicheng Li, Weihao Lu, Qian Lin

TL;DR

This work develops a general framework to characterize kernel ridge regression in high-dimensional settings where the sample size scales as $n\asymp d^{\gamma}$ and the target lies in an interpolation space $[\mathcal{H}]^{s}$ with $s>0$. By introducing capacity- and source-condition-dependent quantities $\mathcal{N}_{1}, \mathcal{N}_{2}, \mathcal{M}_{1}, \mathcal{M}_{2}$, the authors derive matching upper and lower bounds for the generalization error with an optimally chosen regularization parameter $\lambda$, and show minimax optimality for $0< s\le 1$ while revealing a saturation effect for $s>1$ in large dimensions. The results exhibit periodic plateau and multiple-descent behavior of the learning curves as $\gamma$ varies and unify prior findings at $s=0$ and $s=1$. Applied to inner-product kernels on the sphere and to neural tangent kernels, the paper provides exact rates for all $s>0$ and clarifies how high-dimensional geometry and smoothness interact with learning, with implications for understanding kernel methods and neural networks in high dimensions. Overall, the work advances precise rate theory for kernel methods in large dimensions and highlights new saturation phenomena beyond classical fixed-dimension theory.

Abstract

Motivated by the studies of neural networks (e.g.,the neural tangent kernel theory), we perform a study on the large-dimensional behavior of kernel ridge regression (KRR) where the sample size $n \asymp d^γ$ for some $γ> 0$. Given an RKHS $\mathcal{H}$ associated with an inner product kernel defined on the sphere $\mathbb{S}^{d}$, we suppose that the true function $f_ρ^{*} \in [\mathcal{H}]^{s}$, the interpolation space of $\mathcal{H}$ with source condition $s>0$. We first determined the exact order (both upper and lower bound) of the generalization error of kernel ridge regression for the optimally chosen regularization parameter $λ$. We then further showed that when $0<s\le1$, KRR is minimax optimal; and when $s>1$, KRR is not minimax optimal (a.k.a. he saturation effect). Our results illustrate that the curves of rate varying along $γ$ exhibit the periodic plateau behavior and the multiple descent behavior and show how the curves evolve with $s>0$. Interestingly, our work provides a unified viewpoint of several recent works on kernel regression in the large-dimensional setting, which correspond to $s=0$ and $s=1$ respectively.

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

TL;DR

This work develops a general framework to characterize kernel ridge regression in high-dimensional settings where the sample size scales as and the target lies in an interpolation space with . By introducing capacity- and source-condition-dependent quantities , the authors derive matching upper and lower bounds for the generalization error with an optimally chosen regularization parameter , and show minimax optimality for while revealing a saturation effect for in large dimensions. The results exhibit periodic plateau and multiple-descent behavior of the learning curves as varies and unify prior findings at and . Applied to inner-product kernels on the sphere and to neural tangent kernels, the paper provides exact rates for all and clarifies how high-dimensional geometry and smoothness interact with learning, with implications for understanding kernel methods and neural networks in high dimensions. Overall, the work advances precise rate theory for kernel methods in large dimensions and highlights new saturation phenomena beyond classical fixed-dimension theory.

Abstract

Motivated by the studies of neural networks (e.g.,the neural tangent kernel theory), we perform a study on the large-dimensional behavior of kernel ridge regression (KRR) where the sample size for some . Given an RKHS associated with an inner product kernel defined on the sphere , we suppose that the true function , the interpolation space of with source condition . We first determined the exact order (both upper and lower bound) of the generalization error of kernel ridge regression for the optimally chosen regularization parameter . We then further showed that when , KRR is minimax optimal; and when , KRR is not minimax optimal (a.k.a. he saturation effect). Our results illustrate that the curves of rate varying along exhibit the periodic plateau behavior and the multiple descent behavior and show how the curves evolve with . Interestingly, our work provides a unified viewpoint of several recent works on kernel regression in the large-dimensional setting, which correspond to and respectively.
Paper Structure (23 sections, 39 theorems, 222 equations, 3 figures)

This paper contains 23 sections, 39 theorems, 222 equations, 3 figures.

Key Result

Theorem 1

Let $\mathcal{N}_{1}, \mathcal{N}_{2}, \mathcal{M}_{1}, \mathcal{M}_{2}$ be defined as n1 n2 m1 m2, and let $d=d(n)$ which is allowed to diverge with $n \to \infty$. Suppose that Assumption assumption kernel, assumption noise and assumption eigenfunction hold. Let $\hat{f}_{\lambda}$ be the KRR esti then we have The notation $\Theta_{\mathbb{P}}$ only involves absolute constants.

Figures (3)

  • Figure 1: Left: The curve of generalization error for estimating $f_{\rho}^{*} \in L^{2}$ (Figure 5 in Ghorbani2019LinearizedTN). Right: The curve of the minimax rates for estimating $f_{\rho}^{*} \in \mathcal{H}$ (Figure 2(b) in lu2023optimal).
  • Figure 2: Convergence rates of KRR in Theorem \ref{['theorem inner s ge 1']}, Theorem \ref{['theorem inner s le 1']} and corresponding minimax lower rates in Theorem \ref{['theorem lower bound']} (ignoring a $\epsilon$-difference) with respect to dimension $d$. We present 6 graphs corresponding to 6 kinds of source conditions: $s = 0.01, 0.5, 1.0, 1.5, 2.0, 2.5$. The x-axis represents asymptotic scaling, $\gamma: n \asymp d^{\gamma}$; the y-axis represents the convergence rate of generalization error, $r: \text{error} \asymp d^{r}$.
  • Figure 3: Convergence rates of KRR in Theorem \ref{['theorem inner s ge 1']}, Theorem \ref{['theorem inner s le 1']} and corresponding minimax lower rates in Theorem \ref{['theorem lower bound']} (ignoring a $\epsilon$-difference) with respect to sample size $n$. We present 3 graphs corresponding to 3 kinds of source conditions: $s =0.5, 1.5, 2.5$. The x-axis represents asymptotic scaling, $\gamma: n \asymp d^{\gamma}$; the y-axis represents the convergence rate of generalization error, $r: \text{error} \asymp n^{r}$.

Theorems & Definitions (45)

  • Theorem 1
  • Theorem 2: Exact convergence rates when $\mathbf{s \ge 1}$
  • Theorem 3: Exact convergence rates when $\mathbf{0 < s < 1}$
  • Remark 4
  • Theorem 5: Minimax lower bound
  • Theorem 6: NTK: exact convergence rates when $\mathbf{s \ge 1}$
  • Theorem 7: NTK: exact convergence rates when $\mathbf{0 < s < 1}$
  • Theorem 8: NTK: minimax lower bound
  • Lemma 9
  • Lemma 10: Approximation B
  • ...and 35 more