Alignment-Sensitive Minimax Rates for Spectral Algorithms with Learned Kernels
Dongming Huang, Zhifan Li, Yicheng Li, Qian Lin
TL;DR
This work introduces the Effective Span Dimension (ESD) as an alignment-sensitive complexity measure that jointly accounts for the signal, spectrum, and noise level $\sigma^2$ in spectral algorithms with learned kernels. By mapping kernel methods to a sequence-model framework, the authors derive minimax rates $\asymp σ^2K$ when the ESD is bounded by $K$, and show that over-parameterized gradient flow can reduce the ESD, thereby improving generalization in adaptive kernel settings. The framework is extended beyond fixed kernels to linear regression and RKHS regression, with supporting numerical experiments that illustrate how adapting eigenvalues and eigenfunctions can enhance spectral estimators. The results provide a unified, intrinsic view of generalization under kernel learning, explaining improvements from adaptivity without relying on traditional eigen-decay or source-condition assumptions. Overall, the ESD and span profile offer a principled language to analyze and compare learned representations in spectral methods, with implications for neural-network representations and beyond.
Abstract
We study spectral algorithms in the setting where kernels are learned from data. We introduce the effective span dimension (ESD), an alignment-sensitive complexity measure that depends jointly on the signal, spectrum, and noise level $σ^2$. The ESD is well-defined for arbitrary kernels and signals without requiring eigen-decay conditions or source conditions. We prove that for sequence models whose ESD is at most $K$, the minimax excess risk scales as $σ^2 K$. Furthermore, we analyze over-parameterized gradient flow and prove that it can reduce the ESD. This finding establishes a connection between adaptive feature learning and provable improvements in generalization of spectral algorithms. We demonstrate the generality of the ESD framework by extending it to linear models and RKHS regression, and we support the theory with numerical experiments. This framework provides a novel perspective on generalization beyond traditional fixed-kernel theories.
