On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

Yicheng Li; Zixiong Yu; Guhan Chen; Qian Lin

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

Yicheng Li, Zixiong Yu, Guhan Chen, Qian Lin

TL;DR

This paper develops a general framework to determine the eigenvalue decay rate (EDR) of a broad class of kernel functions on general domains, with the neural tangent kernel (NTK) as a central example. The authors transform and restrict kernels to reduce analysis to dot-product kernels on the sphere, where spherical harmonics yield explicit spectral decompositions and known EDRs, then show the EDR is preserved under these transformations. They prove that the NTK associated with multilayer ReLU networks has EDR $igl(i^{-(d+1)/d}igr)$ on general domains, matching the sphere case, under mild decay assumptions (cond:EDR). Furthermore, they establish uniform convergence of wide NN training to NTK regression, derive minimax-optimal rates via kernel regression theory, and demonstrate the necessity of early stopping to avoid overfitting. The results provide a principled explanation for neural network generalization in fixed dimension and offer a broadly applicable method for analyzing kernel spectra beyond the sphere, with potential extensions to other architectures and domain geometries.

Abstract

In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

TL;DR

on general domains, matching the sphere case, under mild decay assumptions (cond:EDR). Furthermore, they establish uniform convergence of wide NN training to NTK regression, derive minimax-optimal rates via kernel regression theory, and demonstrate the necessity of early stopping to avoid overfitting. The results provide a principled explanation for neural network generalization in fixed dimension and offer a broadly applicable method for analyzing kernel spectra beyond the sphere, with potential extensions to other architectures and domain geometries.

Abstract

In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than

. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function

, an interpolation space associated with the RKHS

of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.

Paper Structure (48 sections, 52 theorems, 163 equations, 1 figure, 1 table)

This paper contains 48 sections, 52 theorems, 163 equations, 1 figure, 1 table.

Introduction
Related works
The EDR of NTKs
The generalization performance of over-parameterized neural networks
The high-dimensional setting
Our contributions
Notations
Analysis of Eigenvalue Decay Rate
The integral operator and the eigenvalues
Preliminary results on the eigenvalues
Eigenvalues of kernels restricted on a subdomain
EDR of NTK on a general domain
Application: Optimal Rates of Over-parameterized Neural Networks
Setting of the neural network
Initialization
...and 33 more sections

Key Result

Proposition 1

Let $\rho: \mathcal{X} \to \mathbb{R}$ be a measurable function such that $\rho\odot k$ satisfies eq:IntegrableKernel. Then,

Figures (1)

Figure 1: Eigenvalue decay of NTK under uniform distribution on $[-1,1]^d$, where $i$ is selected in $[50,200]$ and $n = 1000$. The dashed black line represents the log least-square fit and the decay rates $r$ are reported.

Theorems & Definitions (60)

Proposition 1
Lemma 2
Proposition 3
Lemma 4
Proposition 5: widom1963_AsymptoticBehavior
Remark 7
Theorem 8
Proposition 9
Theorem 10
Remark 11
...and 50 more

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

TL;DR

Abstract

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (60)