Table of Contents
Fetching ...

Alignment-Sensitive Minimax Rates for Spectral Algorithms with Learned Kernels

Dongming Huang, Zhifan Li, Yicheng Li, Qian Lin

TL;DR

This work introduces the Effective Span Dimension (ESD) as an alignment-sensitive complexity measure that jointly accounts for the signal, spectrum, and noise level $\sigma^2$ in spectral algorithms with learned kernels. By mapping kernel methods to a sequence-model framework, the authors derive minimax rates $\asymp σ^2K$ when the ESD is bounded by $K$, and show that over-parameterized gradient flow can reduce the ESD, thereby improving generalization in adaptive kernel settings. The framework is extended beyond fixed kernels to linear regression and RKHS regression, with supporting numerical experiments that illustrate how adapting eigenvalues and eigenfunctions can enhance spectral estimators. The results provide a unified, intrinsic view of generalization under kernel learning, explaining improvements from adaptivity without relying on traditional eigen-decay or source-condition assumptions. Overall, the ESD and span profile offer a principled language to analyze and compare learned representations in spectral methods, with implications for neural-network representations and beyond.

Abstract

We study spectral algorithms in the setting where kernels are learned from data. We introduce the effective span dimension (ESD), an alignment-sensitive complexity measure that depends jointly on the signal, spectrum, and noise level $σ^2$. The ESD is well-defined for arbitrary kernels and signals without requiring eigen-decay conditions or source conditions. We prove that for sequence models whose ESD is at most $K$, the minimax excess risk scales as $σ^2 K$. Furthermore, we analyze over-parameterized gradient flow and prove that it can reduce the ESD. This finding establishes a connection between adaptive feature learning and provable improvements in generalization of spectral algorithms. We demonstrate the generality of the ESD framework by extending it to linear models and RKHS regression, and we support the theory with numerical experiments. This framework provides a novel perspective on generalization beyond traditional fixed-kernel theories.

Alignment-Sensitive Minimax Rates for Spectral Algorithms with Learned Kernels

TL;DR

This work introduces the Effective Span Dimension (ESD) as an alignment-sensitive complexity measure that jointly accounts for the signal, spectrum, and noise level in spectral algorithms with learned kernels. By mapping kernel methods to a sequence-model framework, the authors derive minimax rates when the ESD is bounded by , and show that over-parameterized gradient flow can reduce the ESD, thereby improving generalization in adaptive kernel settings. The framework is extended beyond fixed kernels to linear regression and RKHS regression, with supporting numerical experiments that illustrate how adapting eigenvalues and eigenfunctions can enhance spectral estimators. The results provide a unified, intrinsic view of generalization under kernel learning, explaining improvements from adaptivity without relying on traditional eigen-decay or source-condition assumptions. Overall, the ESD and span profile offer a principled language to analyze and compare learned representations in spectral methods, with implications for neural-network representations and beyond.

Abstract

We study spectral algorithms in the setting where kernels are learned from data. We introduce the effective span dimension (ESD), an alignment-sensitive complexity measure that depends jointly on the signal, spectrum, and noise level . The ESD is well-defined for arbitrary kernels and signals without requiring eigen-decay conditions or source conditions. We prove that for sequence models whose ESD is at most , the minimax excess risk scales as . Furthermore, we analyze over-parameterized gradient flow and prove that it can reduce the ESD. This finding establishes a connection between adaptive feature learning and provable improvements in generalization of spectral algorithms. We demonstrate the generality of the ESD framework by extending it to linear models and RKHS regression, and we support the theory with numerical experiments. This framework provides a novel perspective on generalization beyond traditional fixed-kernel theories.

Paper Structure

This paper contains 61 sections, 26 theorems, 211 equations, 7 figures.

Key Result

Theorem 3.2

Let $\widehat{\bm{\theta}}^{\operatorname{PC},\nu}$ be the PC estimator in eq:pc-estimator for the sequence model in eq:SeqModel. Denote by $\mathcal{R}_{*}^{\operatorname{PC}}$ the minimal possible risk over all choices of $\nu$. Let $d^\dagger = d^\dagger(\sigma^2; \bm{\theta}^*, \bm{\lambda})$ be

Figures (7)

  • Figure 1: Evolution of span profiles during the training of an over-parameterized gradient flow. The misalignment level $q$ varies from $1$ to $3$. Fixed parameters are $n=10000$, $\sigma_0=1$, $d=5000$, $J=15$, $p=2.5$, and $\gamma=1$.
  • Figure 2: Averaged squared error of the tuned PC estimator and ESD as a function of the training time. Each average is computed based on 20 replications and each error bar represents a standard deviation.
  • Figure 3: Oracle PCR risk versus Effective Span Dimension for (a) geometric eigen-decay and (b) logarithmic eigen-decay. The dashed line plots $\text{Risk}\times n/\sigma_0^{2}$; the solid line is $d^{\dagger}(\alpha)$. The risk is computed based on 20 replications and the error bar represents the standard deviation.
  • Figure 4: Effective Span Dimension and Optimal KPCPE risk. The dashed line plots Risk $\times n/\sigma_{0}^{2}$; the solid line is $d^{\dagger}(\alpha)$. The risk is computed based on 20 replications and the error bar represents the standard deviation.
  • Figure 5: Pathwise ESD and risk under a learned kernel using a 4-layer linear network.
  • ...and 2 more figures

Theorems & Definitions (63)

  • Definition 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Example 3.4
  • Definition 3.5
  • Proposition 3.6
  • Example 4.2
  • Theorem 4.3
  • Example 4.4
  • Proposition 5.1
  • ...and 53 more