Kernel ridge regression under power-law data: spectrum and generalization
Arie Wortsman, Bruno Loureiro
TL;DR
This work analyzes kernel ridge regression under high‑dimensional, anisotropic Gaussian data with power‑law covariance, revealing how data structure interacts with non‑linear kernel features. By deriving an exact asymptotic spectrum for polynomial inner‑product kernels and linking it to the data decay, the authors show that effective dimension, not ambient dimension, governs sample complexity in the high‑dimensional regime α ∈ [0,1). In particular, they demonstrate a spectral gap regime for α in [0,1) transitioning to a continuous spectrum for α≥1, and prove that KRR with a Hermite kernel learns primarily low‑frequency components when n scales as a polynomial in d, with the learned degree bounded by D(κ) = ⌊κ/(1−α)⌋. The results indicate a fundamental statistical advantage of power‑law anisotropic data, especially when the target aligns with leading covariance directions, and are supported by numerical experiments that illustrate spectrum structure and excess risk behavior. Overall, the paper connects high‑dimensional isotropic analysis with classic source‑capacity ideas, offering rigorous non‑linear KRR results under power‑law data.
Abstract
In this work, we investigate high-dimensional kernel ridge regression (KRR) on i.i.d. Gaussian data with anisotropic power-law covariance. This setting differs fundamentally from the classical source & capacity conditions for KRR, where power-law assumptions are typically imposed on the kernel eigen-spectrum itself. Our contributions are twofold. First, we derive an explicit characterization of the kernel spectrum for polynomial inner-product kernels, giving a precise description of how the kernel eigen-spectrum inherits the data decay. Second, we provide an asymptotic analysis of the excess risk in the high-dimensional regime for a particular kernel with this spectral behavior, showing that the sample complexity is governed by the effective dimension of the data rather than the ambient dimension. These results establish a fundamental advantage of learning with power-law anisotropic data over isotropic data. To our knowledge, this is the first rigorous treatment of non-linear KRR under power-law data.
