Table of Contents
Fetching ...

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

Yicheng Li, Zixiong Yu, Guhan Chen, Qian Lin

TL;DR

This paper develops a general framework to determine the eigenvalue decay rate (EDR) of a broad class of kernel functions on general domains, with the neural tangent kernel (NTK) as a central example. The authors transform and restrict kernels to reduce analysis to dot-product kernels on the sphere, where spherical harmonics yield explicit spectral decompositions and known EDRs, then show the EDR is preserved under these transformations. They prove that the NTK associated with multilayer ReLU networks has EDR $igl(i^{-(d+1)/d}igr)$ on general domains, matching the sphere case, under mild decay assumptions (cond:EDR). Furthermore, they establish uniform convergence of wide NN training to NTK regression, derive minimax-optimal rates via kernel regression theory, and demonstrate the necessity of early stopping to avoid overfitting. The results provide a principled explanation for neural network generalization in fixed dimension and offer a broadly applicable method for analyzing kernel spectra beyond the sphere, with potential extensions to other architectures and domain geometries.

Abstract

In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

TL;DR

This paper develops a general framework to determine the eigenvalue decay rate (EDR) of a broad class of kernel functions on general domains, with the neural tangent kernel (NTK) as a central example. The authors transform and restrict kernels to reduce analysis to dot-product kernels on the sphere, where spherical harmonics yield explicit spectral decompositions and known EDRs, then show the EDR is preserved under these transformations. They prove that the NTK associated with multilayer ReLU networks has EDR on general domains, matching the sphere case, under mild decay assumptions (cond:EDR). Furthermore, they establish uniform convergence of wide NN training to NTK regression, derive minimax-optimal rates via kernel regression theory, and demonstrate the necessity of early stopping to avoid overfitting. The results provide a principled explanation for neural network generalization in fixed dimension and offer a broadly applicable method for analyzing kernel spectra beyond the sphere, with potential extensions to other architectures and domain geometries.

Abstract

In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than . This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function , an interpolation space associated with the RKHS of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.
Paper Structure (48 sections, 52 theorems, 163 equations, 1 figure, 1 table)

This paper contains 48 sections, 52 theorems, 163 equations, 1 figure, 1 table.

Key Result

Proposition 1

Let $\rho: \mathcal{X} \to \mathbb{R}$ be a measurable function such that $\rho\odot k$ satisfies eq:IntegrableKernel. Then,

Figures (1)

  • Figure 1: Eigenvalue decay of NTK under uniform distribution on $[-1,1]^d$, where $i$ is selected in $[50,200]$ and $n = 1000$. The dashed black line represents the log least-square fit and the decay rates $r$ are reported.

Theorems & Definitions (60)

  • Proposition 1
  • Lemma 2
  • Proposition 3
  • Lemma 4
  • Proposition 5: widom1963_AsymptoticBehavior
  • Remark 7
  • Theorem 8
  • Proposition 9
  • Theorem 10
  • Remark 11
  • ...and 50 more