Table of Contents
Fetching ...

Kernel ridge regression under power-law data: spectrum and generalization

Arie Wortsman, Bruno Loureiro

TL;DR

This work analyzes kernel ridge regression under high‑dimensional, anisotropic Gaussian data with power‑law covariance, revealing how data structure interacts with non‑linear kernel features. By deriving an exact asymptotic spectrum for polynomial inner‑product kernels and linking it to the data decay, the authors show that effective dimension, not ambient dimension, governs sample complexity in the high‑dimensional regime α ∈ [0,1). In particular, they demonstrate a spectral gap regime for α in [0,1) transitioning to a continuous spectrum for α≥1, and prove that KRR with a Hermite kernel learns primarily low‑frequency components when n scales as a polynomial in d, with the learned degree bounded by D(κ) = ⌊κ/(1−α)⌋. The results indicate a fundamental statistical advantage of power‑law anisotropic data, especially when the target aligns with leading covariance directions, and are supported by numerical experiments that illustrate spectrum structure and excess risk behavior. Overall, the paper connects high‑dimensional isotropic analysis with classic source‑capacity ideas, offering rigorous non‑linear KRR results under power‑law data.

Abstract

In this work, we investigate high-dimensional kernel ridge regression (KRR) on i.i.d. Gaussian data with anisotropic power-law covariance. This setting differs fundamentally from the classical source & capacity conditions for KRR, where power-law assumptions are typically imposed on the kernel eigen-spectrum itself. Our contributions are twofold. First, we derive an explicit characterization of the kernel spectrum for polynomial inner-product kernels, giving a precise description of how the kernel eigen-spectrum inherits the data decay. Second, we provide an asymptotic analysis of the excess risk in the high-dimensional regime for a particular kernel with this spectral behavior, showing that the sample complexity is governed by the effective dimension of the data rather than the ambient dimension. These results establish a fundamental advantage of learning with power-law anisotropic data over isotropic data. To our knowledge, this is the first rigorous treatment of non-linear KRR under power-law data.

Kernel ridge regression under power-law data: spectrum and generalization

TL;DR

This work analyzes kernel ridge regression under high‑dimensional, anisotropic Gaussian data with power‑law covariance, revealing how data structure interacts with non‑linear kernel features. By deriving an exact asymptotic spectrum for polynomial inner‑product kernels and linking it to the data decay, the authors show that effective dimension, not ambient dimension, governs sample complexity in the high‑dimensional regime α ∈ [0,1). In particular, they demonstrate a spectral gap regime for α in [0,1) transitioning to a continuous spectrum for α≥1, and prove that KRR with a Hermite kernel learns primarily low‑frequency components when n scales as a polynomial in d, with the learned degree bounded by D(κ) = ⌊κ/(1−α)⌋. The results indicate a fundamental statistical advantage of power‑law anisotropic data, especially when the target aligns with leading covariance directions, and are supported by numerical experiments that illustrate spectrum structure and excess risk behavior. Overall, the paper connects high‑dimensional isotropic analysis with classic source‑capacity ideas, offering rigorous non‑linear KRR results under power‑law data.

Abstract

In this work, we investigate high-dimensional kernel ridge regression (KRR) on i.i.d. Gaussian data with anisotropic power-law covariance. This setting differs fundamentally from the classical source & capacity conditions for KRR, where power-law assumptions are typically imposed on the kernel eigen-spectrum itself. Our contributions are twofold. First, we derive an explicit characterization of the kernel spectrum for polynomial inner-product kernels, giving a precise description of how the kernel eigen-spectrum inherits the data decay. Second, we provide an asymptotic analysis of the excess risk in the high-dimensional regime for a particular kernel with this spectral behavior, showing that the sample complexity is governed by the effective dimension of the data rather than the ambient dimension. These results establish a fundamental advantage of learning with power-law anisotropic data over isotropic data. To our knowledge, this is the first rigorous treatment of non-linear KRR under power-law data.

Paper Structure

This paper contains 19 sections, 23 theorems, 174 equations, 4 figures.

Key Result

Proposition 1

Let $\sigma_1, \dots, \sigma_d \in \mathbb{R}_{+}$, and define the diagonal covariance matrix $\Sigma = \mathrm{diag}(\sigma_1, \dots, \sigma_d)$. Then the integral operator $T_{\leq D}$ associated to the truncated kernel: has $\binom{d + D}{D}$ non-zero eigenvalues. Moreover, for each multi-index $\beta \in \mathbb{Z}^{d}_{\geq 0}$, with $|\beta|= \beta_1 + \dots +\beta_d \leq D$, there exists a

Figures (4)

  • Figure 1: Illustration of the kernel spectrum for $\alpha \in [\frac{1}{\ell + 1},\frac{1}{\ell})$, for $\ell \in \mathbb{N}$, from \ref{['prop:spectrum_power_law_final']}, shown in normalized log–log scale and highlighting both the spectral gap and continuous regions. The grey solid horizontal line corresponds to the isotropic case, where the degenerate eigenvalues are grouped into piecewise constant levels: at each level $m \geq 0$, there are $\Theta(d^{m})$ eigenvalues of magnitude $\Theta(d^{-m})$. By contrast, the black solid line depicts the anisotropic case with $\alpha \in (0,1)$, where the spectrum separates into two distinct regimes. In the spectral gap region, on the left of the figure, levels $m \leq \kappa_{\rm cont.}= \ell$ contain $\Theta(d^{m})$ non-degenerate eigenvalues of order $\Theta(r_0(\Sigma)^{-m})$ and increasing steepness, with successive levels separated by spectral gaps of decreasing side, starting at multiples of $b=\log{r_0(\Sigma)}/\log{d}$. Beyond this, in the continuous region$m \geq \kappa_{\rm cont.}$, the gaps disappear and the eigenvalues overlap across levels, yielding a continuous spectrum that becomes increasingly steep at each level $m$.
  • Figure 2: Left: Theoretical Spectrum the kernel resulting by truncating $k(x,x') = \exp(\langle x, x' \rangle)$ on the 5-th degree of it's Taylor expansion, and with $x,x' \sim \gamma_d^{\alpha}$ for $\alpha \in \{ 0, 0.3, 0.7, 1.05\}$, with $d = 20$. Right: Theoretical spectrum of the kernel $k(x,x') = ( 1 + \langle x, x' \rangle)^3$, with $x,x' \sim \gamma_d^{\alpha}$ for $\alpha \in \{ 0, 0.3, 0.7, 1.05\}$, with $d = 100$.
  • Figure 3: The plot corresponds to the theoretical spectrum of a polynomial kernel $K(x,x') = \langle x,x'\rangle^3$ with $d = 100$. Dashed lines correspond to function $C \cdot i^{-\alpha}$ for each value of $\alpha \in \{1.01, 1.5, 2\}$.
  • Figure 4: Excess risk for the kernel in Equation \ref{['def:hermite_kernel']} maximum degree equal to $3$ with $d = 100$, $\lambda =0.01$. The target function is of the form $f_{\star}(x) = {\rm He}_{1}(z_{i}) + {\rm He}_{2}(z_{i}) + {\rm He}_{3}(z_{i})$. In the first plot (Left) , we take $i=1$, while in the second (Right), we take $i=d$. Plots are obtained by averaging $10$ seeds, and bars denote the standard deviation.

Theorems & Definitions (46)

  • Definition 1: Effective Dimension
  • Remark 1
  • Proposition 1
  • proof : Sketch of the Proof:
  • Remark 2
  • Remark 3: Isotropic case
  • Corollary 1
  • Corollary 2
  • proof : Sketch of the Proof:
  • Proposition 2
  • ...and 36 more