Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions
Haobo Zhang, Yicheng Li, Weihao Lu, Qian Lin
TL;DR
This work develops a general framework to characterize kernel ridge regression in high-dimensional settings where the sample size scales as $n\asymp d^{\gamma}$ and the target lies in an interpolation space $[\mathcal{H}]^{s}$ with $s>0$. By introducing capacity- and source-condition-dependent quantities $\mathcal{N}_{1}, \mathcal{N}_{2}, \mathcal{M}_{1}, \mathcal{M}_{2}$, the authors derive matching upper and lower bounds for the generalization error with an optimally chosen regularization parameter $\lambda$, and show minimax optimality for $0< s\le 1$ while revealing a saturation effect for $s>1$ in large dimensions. The results exhibit periodic plateau and multiple-descent behavior of the learning curves as $\gamma$ varies and unify prior findings at $s=0$ and $s=1$. Applied to inner-product kernels on the sphere and to neural tangent kernels, the paper provides exact rates for all $s>0$ and clarifies how high-dimensional geometry and smoothness interact with learning, with implications for understanding kernel methods and neural networks in high dimensions. Overall, the work advances precise rate theory for kernel methods in large dimensions and highlights new saturation phenomena beyond classical fixed-dimension theory.
Abstract
Motivated by the studies of neural networks (e.g.,the neural tangent kernel theory), we perform a study on the large-dimensional behavior of kernel ridge regression (KRR) where the sample size $n \asymp d^γ$ for some $γ> 0$. Given an RKHS $\mathcal{H}$ associated with an inner product kernel defined on the sphere $\mathbb{S}^{d}$, we suppose that the true function $f_ρ^{*} \in [\mathcal{H}]^{s}$, the interpolation space of $\mathcal{H}$ with source condition $s>0$. We first determined the exact order (both upper and lower bound) of the generalization error of kernel ridge regression for the optimally chosen regularization parameter $λ$. We then further showed that when $0<s\le1$, KRR is minimax optimal; and when $s>1$, KRR is not minimax optimal (a.k.a. he saturation effect). Our results illustrate that the curves of rate varying along $γ$ exhibit the periodic plateau behavior and the multiple descent behavior and show how the curves evolve with $s>0$. Interestingly, our work provides a unified viewpoint of several recent works on kernel regression in the large-dimensional setting, which correspond to $s=0$ and $s=1$ respectively.
