Table of Contents
Fetching ...

New Empirical Process Tools and Their Applications to Robust Deep ReLU Networks and Phase Transitions for Nonparametric Regression

Yizhe Ding, Runze Li, Lingzhou Xue

TL;DR

This work develops two Dudley-type maximal inequalities that remove light-tail and uniform boundedness assumptions, enabling nonasymptotic analysis for heavy-tailed noise and non-Donsker function classes. Applied to deep ReLU models, it proves a unified sub-Gaussian concentration bound for deep Huber estimators across all noise regimes and establishes nonasymptotic robustness for deep quantile regression without moment assumptions. The authors extend these tools to nonparametric least squares and generalized linear models under heavy tails, revealing rich phase-transition behavior determined by tail thickness and function class complexity. Additionally, the framework covers broad ERM settings and yields minimax-optimal results for set-structured regression, significantly broadening the applicability of empirical process theory to modern robust and nonparametric learning problems. These results offer finite-sample, distribution-agnostic robustness guarantees with practical implications for robust deep learning under heavy-tailed data.

Abstract

This paper introduces new empirical process tools for analyzing a broad class of statistical learning models under heavy-tailed noise and complex function classes. Our primary contribution is the derivation of two Dudley-type maximal inequalities for expected empirical processes that remove restrictive assumptions such as light tails and uniform boundedness of the function class. These inequalities enlarge the scope of empirical process theory available for statistical learning and nonparametric estimation. Exploiting the new bounds, we establish robustness guarantees for deep ReLU network estimators in Huber and quantile regression. In particular, we prove a unified non-asymptotic sub-Gaussian concentration bound that remains valid even under infinite-variance noise and provide a comprehensive analysis of non-asymptotic robustness for deep Huber estimators across all noise regimes. For deep quantile regression, we provide the first non-asymptotic sub-Gaussian bounds without requiring moment assumptions. As an additional application, our framework yields estimation error bounds for nonparametric least-squares estimators that simultaneously accommodate infinite-variance noise, non-Donsker function classes, and approximation error. Moreover, unlike prior approaches based on specialized multiplier processes, our framework extends to broader empirical risk minimization problems, including the nonparametric generalized linear models and the ``set-structured'' models.

New Empirical Process Tools and Their Applications to Robust Deep ReLU Networks and Phase Transitions for Nonparametric Regression

TL;DR

This work develops two Dudley-type maximal inequalities that remove light-tail and uniform boundedness assumptions, enabling nonasymptotic analysis for heavy-tailed noise and non-Donsker function classes. Applied to deep ReLU models, it proves a unified sub-Gaussian concentration bound for deep Huber estimators across all noise regimes and establishes nonasymptotic robustness for deep quantile regression without moment assumptions. The authors extend these tools to nonparametric least squares and generalized linear models under heavy tails, revealing rich phase-transition behavior determined by tail thickness and function class complexity. Additionally, the framework covers broad ERM settings and yields minimax-optimal results for set-structured regression, significantly broadening the applicability of empirical process theory to modern robust and nonparametric learning problems. These results offer finite-sample, distribution-agnostic robustness guarantees with practical implications for robust deep learning under heavy-tailed data.

Abstract

This paper introduces new empirical process tools for analyzing a broad class of statistical learning models under heavy-tailed noise and complex function classes. Our primary contribution is the derivation of two Dudley-type maximal inequalities for expected empirical processes that remove restrictive assumptions such as light tails and uniform boundedness of the function class. These inequalities enlarge the scope of empirical process theory available for statistical learning and nonparametric estimation. Exploiting the new bounds, we establish robustness guarantees for deep ReLU network estimators in Huber and quantile regression. In particular, we prove a unified non-asymptotic sub-Gaussian concentration bound that remains valid even under infinite-variance noise and provide a comprehensive analysis of non-asymptotic robustness for deep Huber estimators across all noise regimes. For deep quantile regression, we provide the first non-asymptotic sub-Gaussian bounds without requiring moment assumptions. As an additional application, our framework yields estimation error bounds for nonparametric least-squares estimators that simultaneously accommodate infinite-variance noise, non-Donsker function classes, and approximation error. Moreover, unlike prior approaches based on specialized multiplier processes, our framework extends to broader empirical risk minimization problems, including the nonparametric generalized linear models and the ``set-structured'' models.

Paper Structure

This paper contains 47 sections, 43 theorems, 317 equations, 3 tables.

Key Result

Theorem 2.1

Let $F\geq 0$ be the envelope function of $\mathcal{F}$, and assume $\|f\|_{L^{2}(P)}\leq\sigma$ for all $f\in\mathcal{F}$. Then,

Theorems & Definitions (70)

  • Theorem 2.1: Informal version of Theorem \ref{['theorem: convergence of EP with L^1 integrable functions']}
  • Theorem 2.2: Informal version of Theorem \ref{['theorem: convergence of EP with L^infty integrable functions']}
  • Theorem 2.3: Informal version of Theorem \ref{['thm: convergence rate of Huber regression']}
  • Theorem 2.4: Informal version of Theorems \ref{['theorem: least squares estimation convergence rate with Linfty covering entropy']} and \ref{['theorem: general ERM convergence rate with Linfty covering entropy']}
  • Theorem 3.1
  • Proposition 3.2: Parameterized Class
  • Proposition 3.3: Weighted Uniform Covering Entropy
  • Theorem 3.4
  • Definition 4.1: ReLU networks
  • Theorem 4.3: Huber Regression
  • ...and 60 more