On the Saturation Effects of Spectral Algorithms in Large Dimensions
Weihao Lu, Haobo Zhang, Yicheng Li, Qian Lin
TL;DR
This work analyzes saturation phenomena for spectral algorithms in the large-dimension regime where $n$ scales as $d^{\gamma}$. It proves that kernel gradient flow with early stopping attains the minimax rate up to polylog factors, while kernel ridge regression can be strictly suboptimal when the regression function is sufficiently smooth ($s>1$). By formulating exact convergence rates for a broad class of analytic spectral algorithms with qualification $\tau$, the paper uncovers saturation (for $s>\tau$), periodic plateau behavior (for $0<s\le 2\tau$), and a polynomial-approximation barrier as $s\to 0$, all in the context of inner-product kernels on the sphere. These results unify fixed-d saturation phenomena with large-d phenomena (multiple descent, plateaus, and barriers) and have implications for understanding high-dimensional kernel methods and the lazy regime of neural networks through NTK-like kernels. The findings establish kernel gradient flow as minimax-optimal in large dimensions and delineate precise regimes where KRR cannot achieve the minimax lower bound.
Abstract
The saturation effects, which originally refer to the fact that kernel ridge regression (KRR) fails to achieve the information-theoretical lower bound when the regression function is over-smooth, have been observed for almost 20 years and were rigorously proved recently for kernel ridge regression and some other spectral algorithms over a fixed dimensional domain. The main focus of this paper is to explore the saturation effects for a large class of spectral algorithms (including the KRR, gradient descent, etc.) in large dimensional settings where $n \asymp d^γ$. More precisely, we first propose an improved minimax lower bound for the kernel regression problem in large dimensional settings and show that the gradient flow with early stopping strategy will result in an estimator achieving this lower bound (up to a logarithmic factor). Similar to the results in KRR, we can further determine the exact convergence rates (both upper and lower bounds) of a large class of (optimal tuned) spectral algorithms with different qualification $τ$'s. In particular, we find that these exact rate curves (varying along $γ$) exhibit the periodic plateau behavior and the polynomial approximation barrier. Consequently, we can fully depict the saturation effects of the spectral algorithms and reveal a new phenomenon in large dimensional settings (i.e., the saturation effect occurs in large dimensional setting as long as the source condition $s>τ$ while it occurs in fixed dimensional setting as long as $s>2τ$).
