Table of Contents
Fetching ...

Understanding Robust Machine Learning for Nonparametric Regression with Heavy-Tailed Noise

Yunlong Feng, Qiang Wu

TL;DR

The paper addresses robust nonparametric regression under heavy-tailed noise and unbounded hypothesis spaces, proposing a shift from excess robust risk to prediction error measured against the true target $f^\star$. It introduces a probabilistic localization via a probabilistic effective hypothesis space $\mathcal{H}_\sigma$ and proves two-sided comparison theorems that relate excess robust risk to $L_2$ prediction error up to a tail-dependent residual $\mathcal{O}(\sigma^{-2\epsilon})$. It then provides nonasymptotic error bounds and explicit rate results under weak $(1+\epsilon)$-moment conditions, guiding joint tuning of the robustness scale $\sigma$ and regularization $\lambda$. The framework, demonstrated on nonparametric Huber regression in RKHS, generalizes to other robust losses and highlights prediction error as the fundamental lens for robust learning, with implications for kernel methods and beyond.

Abstract

We investigate robust nonparametric regression in the presence of heavy-tailed noise, where the hypothesis class may contain unbounded functions and robustness is ensured via a robust loss function $\ell_σ$. Using Huber regression as a close-up example within Tikhonov-regularized risk minimization in reproducing kernel Hilbert spaces (RKHS), we address two central challenges: (i) the breakdown of standard concentration tools under weak moment assumptions, and (ii) the analytical difficulties introduced by unbounded hypothesis spaces. Our first message is conceptual: conventional generalization-error bounds for robust losses do not faithfully capture out-of-sample performance. We argue that learnability should instead be quantified through prediction error, namely the $L_2$-distance to the truth $f^\star$, which is $σ$-independent and directly reflects the target of robust estimation. To make this workable under unboundedness, we introduce a \emph{probabilistic effective hypothesis space} that confines the estimator with high probability and enables a meaningful bias--variance decomposition under weak $(1+ε)$-moment conditions. Technically, we establish new comparison theorems linking the excess robust risk to the $L_2$ prediction error up to a residual of order $\mathcal{O}(σ^{-2ε})$, clarifying the robustness--bias trade-off induced by the scale parameter $σ$. Building on this, we derive explicit finite-sample error bounds and convergence rates for Huber regression in RKHS that hold without uniform boundedness and under heavy-tailed noise. Our study delivers principled tuning rules, extends beyond Huber to other robust losses, and highlights prediction error, not excess generalization risk, as the fundamental lens for analyzing robust learning.

Understanding Robust Machine Learning for Nonparametric Regression with Heavy-Tailed Noise

TL;DR

The paper addresses robust nonparametric regression under heavy-tailed noise and unbounded hypothesis spaces, proposing a shift from excess robust risk to prediction error measured against the true target . It introduces a probabilistic localization via a probabilistic effective hypothesis space and proves two-sided comparison theorems that relate excess robust risk to prediction error up to a tail-dependent residual . It then provides nonasymptotic error bounds and explicit rate results under weak -moment conditions, guiding joint tuning of the robustness scale and regularization . The framework, demonstrated on nonparametric Huber regression in RKHS, generalizes to other robust losses and highlights prediction error as the fundamental lens for robust learning, with implications for kernel methods and beyond.

Abstract

We investigate robust nonparametric regression in the presence of heavy-tailed noise, where the hypothesis class may contain unbounded functions and robustness is ensured via a robust loss function . Using Huber regression as a close-up example within Tikhonov-regularized risk minimization in reproducing kernel Hilbert spaces (RKHS), we address two central challenges: (i) the breakdown of standard concentration tools under weak moment assumptions, and (ii) the analytical difficulties introduced by unbounded hypothesis spaces. Our first message is conceptual: conventional generalization-error bounds for robust losses do not faithfully capture out-of-sample performance. We argue that learnability should instead be quantified through prediction error, namely the -distance to the truth , which is -independent and directly reflects the target of robust estimation. To make this workable under unboundedness, we introduce a \emph{probabilistic effective hypothesis space} that confines the estimator with high probability and enables a meaningful bias--variance decomposition under weak -moment conditions. Technically, we establish new comparison theorems linking the excess robust risk to the prediction error up to a residual of order , clarifying the robustness--bias trade-off induced by the scale parameter . Building on this, we derive explicit finite-sample error bounds and convergence rates for Huber regression in RKHS that hold without uniform boundedness and under heavy-tailed noise. Our study delivers principled tuning rules, extends beyond Huber to other robust losses, and highlights prediction error, not excess generalization risk, as the fundamental lens for analyzing robust learning.

Paper Structure

This paper contains 25 sections, 9 theorems, 107 equations, 2 figures.

Key Result

Theorem 1

Let Assumption moment_assumption hold. Under the restriction $\sigma \geq \max(1, M)$, the following inequality holds: where $C'$ is an absolute positive constant independent of $f$ or $\sigma$.

Figures (2)

  • Figure 1: An illustration of the approach to bounding the prediction error $\|f_\mathbf{z}-f^\star\|_{2,\rho}^2$ when $f_\mathbf{z}$ is produced in \ref{['bounded_case']} and the hypothesis space $\mathcal{H}$ is uniformly bounded. Here, $f_\sigma$ denotes the sample-free version of $f_\mathbf{z}$ and $f_{\mathcal{H}}$ denotes the projection of $f^\star$ onto $\mathcal{H}$.
  • Figure 2: An illustration of our proposed approach to bounding the prediction error $\|f_\mathbf{z}-f^\star\|_{2,\rho}^2$, where $f_\mathbf{z}$ is produced in \ref{['empirical_target_function']} and functions in the hypothesis space $\mathcal{H}_K$ are non-uniformly bounded. Here, $f_{\sigma,\lambda}$ defined in \ref{['population_version_huber_regularized']} denotes the sample-free version of $f_\mathbf{z}$ and $f_{\lambda}$ is the reference function given in \ref{['population_version_ls_regularized']}.

Theorems & Definitions (17)

  • Theorem 1
  • Lemma 2
  • proof
  • Proposition 3
  • proof
  • Theorem 4
  • proof
  • Theorem 5
  • proof
  • Theorem 6
  • ...and 7 more