Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Yicheng Li; Weiye Gan; Zuoqiang Shi; Qian Lin

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Yicheng Li, Weiye Gan, Zuoqiang Shi, Qian Lin

TL;DR

This work rigorously provides a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression, to sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification.

Abstract

The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest.

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

TL;DR

Abstract

Paper Structure (42 sections, 36 theorems, 230 equations, 2 figures)

This paper contains 42 sections, 36 theorems, 230 equations, 2 figures.

Introduction
Related works
Optimality of kernel methods
Recent advances in kernel ridge regression
Kernel regression in the high-dimensional limit
Preliminaries
Reproducing kernel Hilbert space
Interpolation spaces
Regular RKHS
Spectral algorithm
Notations
Main results
More assumptions
Main theorem
Discussion
...and 27 more sections

Key Result

Proposition 2.2

Under assu:EDR and assu:RegularRKHS, the embedding index is $\alpha_0 = 1/\beta$.

Figures (2)

Figure 1: An illustration of the filter functions $\varphi_\lambda$ and $\psi_\lambda$.
Figure 2: An illustration of the contour $\Gamma_\lambda$ defined in eq:contour. The region enclosed by $\Gamma_{\lambda}$ is just $D_\lambda$ in assu:Filter. The dashed interval $[0,\kappa^2]$ contains the spectrum of $T$ and $T_X$.

Theorems & Definitions (74)

Remark 2.1
Example 2.1: Shift-invariant periodic kernels
Example 2.2: Dot-product kernel on the sphere
Example 2.3: Dot-product kernel on the ball
Proposition 2.2
Definition 2.3: Filter functions
Remark 2.4
Remark 2.5
Example 2.4: Kernel ridge regression
Example 2.5: Iterated ridge regression
...and 64 more

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

TL;DR

Abstract

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (74)