Table of Contents
Fetching ...

Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity

Yicheng Li, Qian Lin

TL;DR

The paper develops a diagonal adaptive kernel regression that jointly learns kernel eigenvalues and output coefficients during training, enabling feature learning beyond fixed kernels and the neural tangent kernel regime. By formalizing gradient-flow dynamics and extending to deeper parameterizations, it shows that eigenvalue adaptation can yield near-oracle generalization rates even under misalignment, and that extra depth further enhances adaptability and generalization. Theoretical results provide nonparametric regression bounds and explicit dynamics for signal vs. noise components, while numerical simulations corroborate the advantage of adaptivity in learning the true directions. Overall, the work offers a new perspective on adaptivity and generalization in neural networks beyond fixed-kernel analyses by linking over-parameterization to eigenvalue learning within an RKHS framework.

Abstract

This paper introduces a diagonal adaptive kernel model that dynamically learns kernel eigenvalues and output coefficients simultaneously during training. Unlike fixed-kernel methods tied to the neural tangent kernel theory, the diagonal adaptive kernel model adapts to the structure of the truth function, significantly improving generalization over fixed-kernel methods, especially when the initial kernel is misaligned with the target. Moreover, we show that the adaptivity comes from learning the right eigenvalues during training, showing a feature learning behavior. By extending to deeper parameterization, we further show how extra depth enhances adaptability and generalization. This study combines the insights from feature learning and implicit regularization and provides new perspective into the adaptivity and generalization potential of neural networks beyond the kernel regime.

Diagonal Over-parameterization in Reproducing Kernel Hilbert Spaces as an Adaptive Feature Model: Generalization and Adaptivity

TL;DR

The paper develops a diagonal adaptive kernel regression that jointly learns kernel eigenvalues and output coefficients during training, enabling feature learning beyond fixed kernels and the neural tangent kernel regime. By formalizing gradient-flow dynamics and extending to deeper parameterizations, it shows that eigenvalue adaptation can yield near-oracle generalization rates even under misalignment, and that extra depth further enhances adaptability and generalization. Theoretical results provide nonparametric regression bounds and explicit dynamics for signal vs. noise components, while numerical simulations corroborate the advantage of adaptivity in learning the true directions. Overall, the work offers a new perspective on adaptivity and generalization in neural networks beyond fixed-kernel analyses by linking over-parameterization to eigenvalue learning within an RKHS framework.

Abstract

This paper introduces a diagonal adaptive kernel model that dynamically learns kernel eigenvalues and output coefficients simultaneously during training. Unlike fixed-kernel methods tied to the neural tangent kernel theory, the diagonal adaptive kernel model adapts to the structure of the truth function, significantly improving generalization over fixed-kernel methods, especially when the initial kernel is misaligned with the target. Moreover, we show that the adaptivity comes from learning the right eigenvalues during training, showing a feature learning behavior. By extending to deeper parameterization, we further show how extra depth enhances adaptability and generalization. This study combines the insights from feature learning and implicit regularization and provides new perspective into the adaptivity and generalization potential of neural networks beyond the kernel regime.
Paper Structure (34 sections, 30 theorems, 246 equations, 2 figures)

This paper contains 34 sections, 30 theorems, 246 equations, 2 figures.

Key Result

Theorem 1

Suppose that assu:EigenSystem and assu:SignalSpan hold. Then, for any $s > 0$, by choosing $t \asymp n^{\frac{1}{2}}$, when $n$ is sufficiently large, with probability at least $1-C/n^2$, we have Here, the constant $C$ and the hidden constants may depend on the constants in the assumptions and the choice of $s$.

Figures (2)

  • Figure 1: Comparison of the fixed kernel method and the diagonal adaptive kernel method over a $d=2$ example with low-dimensional structure. The upper left figure shows the generalization error curve. The upper right figure shows the true coefficients and the lower two rows show the evolution of the coefficients for the fixed kernel method and the diagonal adaptive kernel method. Since the indices of the eigenfunctions are 2-dimensional, we plot the coefficients in a 2D grid.
  • Figure 2: Comparison of the fixed kernel method and the diagonal adaptive kernel method in various settings. The three rows correspond to dimension $d=2,3,4$ in example:low-dim-structure respectively. The error bars represent the standard deviation over 32 runs.

Theorems & Definitions (33)

  • Example 1: Eigenfunctions in common order
  • Example 2: Low-dimensional structure
  • Example 3: Misalignment
  • Theorem 1
  • Theorem 2
  • Corollary 3
  • Proposition 4
  • Lemma 5
  • Proposition 6: Shrinkage monotonicity and shrinkage time
  • Lemma 7: Perturbation bound
  • ...and 23 more