Table of Contents
Fetching ...

Optimal Rates and Saturation for Noiseless Kernel Ridge Regression

Jihao Long, Xiaojun Peng, Lei Wu

TL;DR

A comprehensive study of kernel ridge regression in the noiseless regime is presented, establishing that, up to logarithmic factors, noiseless KRR achieves minimax optimal convergence rates, jointly determined by the eigenvalue decay of the associated integral operator and the target function's smoothness.

Abstract

Kernel ridge regression (KRR), also known as the least-squares support vector machine, is a fundamental method for learning functions from finite samples. While most existing analyses focus on the noisy setting with constant-level label noise, we present a comprehensive study of KRR in the noiseless regime -- a critical setting in scientific computing where data are often generated via high-fidelity numerical simulations. We establish that, up to logarithmic factors, noiseless KRR achieves minimax optimal convergence rates, jointly determined by the eigenvalue decay of the associated integral operator and the target function's smoothness. These rates are derived under Sobolev-type interpolation norms, with the $L^2$ norm as a special case. Notably, we uncover two key phenomena: an extra-smoothness effect, where the KRR solution exhibits higher smoothness than typical functions in the native reproducing kernel Hilbert space (RKHS), and a saturation effect, where the KRR's adaptivity to the target function's smoothness plateaus beyond a certain level. Leveraging these insights, we also derive a novel error bound for noisy KRR that is noise-level aware and achieves minimax optimality in both noiseless and noisy regimes. As a key technical contribution, we introduce a refined notion of degrees of freedom, which we believe has broader applicability in the analysis of kernel methods. Extensive numerical experiments validate our theoretical results and provide insights beyond existing theory.

Optimal Rates and Saturation for Noiseless Kernel Ridge Regression

TL;DR

A comprehensive study of kernel ridge regression in the noiseless regime is presented, establishing that, up to logarithmic factors, noiseless KRR achieves minimax optimal convergence rates, jointly determined by the eigenvalue decay of the associated integral operator and the target function's smoothness.

Abstract

Kernel ridge regression (KRR), also known as the least-squares support vector machine, is a fundamental method for learning functions from finite samples. While most existing analyses focus on the noisy setting with constant-level label noise, we present a comprehensive study of KRR in the noiseless regime -- a critical setting in scientific computing where data are often generated via high-fidelity numerical simulations. We establish that, up to logarithmic factors, noiseless KRR achieves minimax optimal convergence rates, jointly determined by the eigenvalue decay of the associated integral operator and the target function's smoothness. These rates are derived under Sobolev-type interpolation norms, with the norm as a special case. Notably, we uncover two key phenomena: an extra-smoothness effect, where the KRR solution exhibits higher smoothness than typical functions in the native reproducing kernel Hilbert space (RKHS), and a saturation effect, where the KRR's adaptivity to the target function's smoothness plateaus beyond a certain level. Leveraging these insights, we also derive a novel error bound for noisy KRR that is noise-level aware and achieves minimax optimality in both noiseless and noisy regimes. As a key technical contribution, we introduce a refined notion of degrees of freedom, which we believe has broader applicability in the analysis of kernel methods. Extensive numerical experiments validate our theoretical results and provide insights beyond existing theory.
Paper Structure (42 sections, 40 theorems, 197 equations, 2 figures)

This paper contains 42 sections, 40 theorems, 197 equations, 2 figures.

Key Result

Lemma 3.3

If $\mu_j\asymp j^{-\beta}$ with $t>1$, then $N_\gamma(\lambda)\asymp \lambda^{-1/\beta}$ for any $\gamma>1/\beta$. If $\mu_j\asymp c^{-j}$ with $c\in (0,1)$, then $N_\gamma(\lambda)\asymp \log(1/\lambda)$ for any $\gamma>0$.

Figures (2)

  • Figure 1: The performance of noiseless KRR improves monotonically as $\lambda$ decreases. The target functions are $F_s^*$ with $s=0.5$ (left) and $s=2$ (right), and the sample size $n=100$.
  • Figure 2: The observed convergence rates align perfectly with our theoretical predictions for all examined pairs of $s$ and $p$. Left: Convergence rates of noiseless KRR for various values of $p$ when the target function is $F_\infty^*$. Middle: Convergence rates of noiseless KRR for various values of $s$, measured in $L^2$ norm (i.e., $p=0$). Right: The solution of noisy KRR for the target function $F_\infty^*$ (with noise level $\sigma=1$) exhibits smoothness of order strictly greater than $1$. In all panels, the dashed line represents the theoretical predictions, and each experiment is repeated 20 times with error bars indicating the standard deviation across runs. For the noiseless experiments (left and middle), we set $\lambda=10^{-20}$, sufficiently small following the guidance from Figure \ref{['fig: lambda']}. For the the noisy experiment (right), we set $\lambda=0.05n^{-\min(s,2)/(\min(s,2)\beta+1)}$, following the theoretical prescription given by the analysis in Eq. \ref{['eqn: noisy-rate']}, with $s=\infty$ and $t=2$.

Theorems & Definitions (76)

  • Remark 1.1: Polynomial decay
  • Remark 1.2: Exponential decay
  • Definition 2.2: Interpolation space
  • Definition 3.1: $\gamma$-DoFs
  • Remark 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Lemma 3.5
  • proof
  • Theorem 4.1
  • ...and 66 more