Table of Contents
Fetching ...

Regularized least squares learning with heavy-tailed noise is minimax optimal

Mattes Mollenhauer, Nicole Mücke, Dimitri Meunier, Arthur Gretton

TL;DR

This work analyzes kernel ridge regression in RKHS under heavy-tailed noise with finite higher moments, establishing minimax-optimal excess risk bounds. It derives both capacity-free and capacity-dependent bounds by leveraging a Hilbert-space version of the Fuk–Nagaev inequality, producing a mixed tail (subgaussian and polynomial) structure in the excess risk. With standard eigenvalue decay, the capacity-dependent rates match known minimax limits, showing robustness of regularized least squares to heavy tails without requiring subexponential noise. Practically, the results inform regularization strength and confidence-level behavior in high-stakes settings and suggest avenues for extending the framework to other losses and misspecified or more general models.

Abstract

This paper examines the performance of ridge regression in reproducing kernel Hilbert spaces in the presence of noise that exhibits a finite number of higher moments. We establish excess risk bounds consisting of subgaussian and polynomial terms based on the well known integral operator framework. The dominant subgaussian component allows to achieve convergence rates that have previously only been derived under subexponential noise - a prevalent assumption in related work from the last two decades. These rates are optimal under standard eigenvalue decay conditions, demonstrating the asymptotic robustness of regularized least squares against heavy-tailed noise. Our derivations are based on a Fuk-Nagaev inequality for Hilbert-space valued random variables.

Regularized least squares learning with heavy-tailed noise is minimax optimal

TL;DR

This work analyzes kernel ridge regression in RKHS under heavy-tailed noise with finite higher moments, establishing minimax-optimal excess risk bounds. It derives both capacity-free and capacity-dependent bounds by leveraging a Hilbert-space version of the Fuk–Nagaev inequality, producing a mixed tail (subgaussian and polynomial) structure in the excess risk. With standard eigenvalue decay, the capacity-dependent rates match known minimax limits, showing robustness of regularized least squares to heavy tails without requiring subexponential noise. Practically, the results inform regularization strength and confidence-level behavior in high-stakes settings and suggest avenues for extending the framework to other losses and misspecified or more general models.

Abstract

This paper examines the performance of ridge regression in reproducing kernel Hilbert spaces in the presence of noise that exhibits a finite number of higher moments. We establish excess risk bounds consisting of subgaussian and polynomial terms based on the well known integral operator framework. The dominant subgaussian component allows to achieve convergence rates that have previously only been derived under subexponential noise - a prevalent assumption in related work from the last two decades. These rates are optimal under standard eigenvalue decay conditions, demonstrating the asymptotic robustness of regularized least squares against heavy-tailed noise. Our derivations are based on a Fuk-Nagaev inequality for Hilbert-space valued random variables.

Paper Structure

This paper contains 59 sections, 21 theorems, 187 equations, 2 figures.

Key Result

Proposition 3.1

Let eq:mom and eq:src be satisfied. For all $\delta \in (0, 1)$ and $n \in \mathbb{N}$ such that we have with confidence $1-\delta$, with where $0 < C_\diamond$ is given in eq:diamond and $c_1 \geq 1$ is the constant from prop:fn_modified depending only on $q$.

Figures (2)

  • Figure 3.1: Illustration of the the effective sample size $n_0$ ensuring subgaussian behavior of the term $\eta(\delta, n)$ defined in \ref{['eq:effective_samples']} for different choices of $q$. For simplicity, we set $c_1 = 1$, $\sigma = 2$ and $Q = 10$.
  • Figure A.1: Empirical approximations of the quantile function of the excess risk $\lVert I_\pi\widehat{f}_\alpha - f_\star \rVert_{L^2(\pi)}$ for (a) the light-tailed noise model given in \ref{['eq:experiment_light']} and (b) in the $t$-distributed noise model given by \ref{['eq:experiment_heavy']} for different choices of the regularization parameter $\alpha$ and fixed sample size $n=20$.

Theorems & Definitions (35)

  • Remark 2.4: Well-specified case
  • Proposition 3.1: Main excess risk bound
  • Corollary 3.2: Subgaussian confidence regime
  • Remark 3.3: Convergence rates
  • Definition 4.1: Effective dimension
  • Proposition 4.3: Capacity-dependent excess risk bound
  • Corollary 4.4: Convergence rates
  • Remark 4.5: Optimality of rates
  • Proposition 5.1: Fuk--Nagaev inequality; Hilbert space version
  • Remark 5.2
  • ...and 25 more