Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach

Nabarun Deb; Debarghya Mukherjee

Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach

Nabarun Deb, Debarghya Mukherjee

TL;DR

A general bound on the expected supremum of empirical processes under standard $\beta/\rho$-mixing assumptions is presented and a new phenomenon is revealed, namely that even under long-range dependence, it is possible to attain the same rates as in the i.i.i.d. setting, provided the underlying function class is complex enough.

Abstract

Empirical process theory for i.i.d. observations has emerged as a ubiquitous tool for understanding the generalization properties of various statistical problems. However, in many applications where the data exhibit temporal dependencies (e.g., in finance, medical imaging, weather forecasting etc.), the corresponding empirical processes are much less understood. Motivated by this observation, we present a general bound on the expected supremum of empirical processes under standard $β/ρ$-mixing assumptions. Unlike most prior work, our results cover both the long and the short-range regimes of dependence. Our main result shows that a non-trivial trade-off between the complexity of the underlying function class and the dependence among the observations characterizes the learning rate in a large class of nonparametric problems. This trade-off reveals a new phenomenon, namely that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting, provided the underlying function class is complex enough. We demonstrate the practical implications of our findings by analyzing various statistical estimators in both fixed and growing dimensions. Our main examples include a comprehensive case study of generalization error bounds in nonparametric regression over smoothness classes in fixed as well as growing dimension using neural nets, shape-restricted multivariate convex regression, estimating the optimal transport (Wasserstein) distance between two probability distributions, and classification under the Mammen-Tsybakov margin condition -- all under appropriate mixing assumptions. In the process, we also develop bounds on $L_r$ ($1\le r\le 2$)-localized empirical processes with dependent observations, which we then leverage to get faster rates for (a) tuning-free adaptation, and (b) set-structured learning problems.

Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach

TL;DR

A general bound on the expected supremum of empirical processes under standard

-mixing assumptions is presented and a new phenomenon is revealed, namely that even under long-range dependence, it is possible to attain the same rates as in the i.i.i.d. setting, provided the underlying function class is complex enough.

Abstract

-mixing assumptions. Unlike most prior work, our results cover both the long and the short-range regimes of dependence. Our main result shows that a non-trivial trade-off between the complexity of the underlying function class and the dependence among the observations characterizes the learning rate in a large class of nonparametric problems. This trade-off reveals a new phenomenon, namely that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting, provided the underlying function class is complex enough. We demonstrate the practical implications of our findings by analyzing various statistical estimators in both fixed and growing dimensions. Our main examples include a comprehensive case study of generalization error bounds in nonparametric regression over smoothness classes in fixed as well as growing dimension using neural nets, shape-restricted multivariate convex regression, estimating the optimal transport (Wasserstein) distance between two probability distributions, and classification under the Mammen-Tsybakov margin condition -- all under appropriate mixing assumptions. In the process, we also develop bounds on

(

)-localized empirical processes with dependent observations, which we then leverage to get faster rates for (a) tuning-free adaptation, and (b) set-structured learning problems.

Paper Structure (35 sections, 37 theorems, 345 equations, 1 figure)

This paper contains 35 sections, 37 theorems, 345 equations, 1 figure.

Introduction
Main contributions
Literature review
Notation
Maximal inequalities with $L_r$ bracketing, $2<r<\infty$
Maximal inequalities with Uniform $L_{\infty}$ bracketing
Localization with $L_r$ bracketing, $1\le r\le 2$
Faster rates with $L_r$-bracketing, $1\le r \le 2$
Rate theorem and localization in learning theory under dependence
Adaptation bounds for complex function classes under mixing assumptions
Set-structured problems with $L_1$-bracketing and mixing assumptions
Applications
Smooth function estimation using deep neural networks
Additive model regression in growing dimension via deep neural networks
Shape constrained multivariate convex least squares
...and 20 more sections

Key Result

Lemma 2.1

The function $\tilde{q}_{n,\beta}(\cdot)$ is well-defined, and always greater than or equal to $1$. Furthermore, both $\tilde{q}_{n,\beta}(\cdot)$ and $\Lambda_{\phi,\beta}(\tilde{q}_{n,\beta}(\cdot))$ are non-decreasing in $\delta$.

Figures (1)

Figure 1: Phase transition curve in terms of $\alpha$ (complexity of the function class) and $\beta$ (the level of dependence) for $\cL_\infty$ covers. When the parameters fall in the red or the blue parts of the picture, the rates are the same as in the i.i.d. case whereas in the green region, the effect of dependence shows up. As is evident the phase transition curve is a smooth parabola up to $\beta=1$ after which the rates are always the same as for i.i.d. observations.

Theorems & Definitions (77)

Definition 2.1: $\beta$-mixing (see doukhan2012mixing)
Definition 2.2: short-range vs long-range dependence
Definition 2.3: Bracketing number
Lemma 2.1
Theorem 2.1: Main theorem
Remark 2.1: Comparison in the i.i.d. case
Corollary 2.1
Corollary 2.2
Remark 2.2: Boundary cases
Remark 2.3: On the $r/(r-2)$ threshold
...and 67 more

Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach

TL;DR

Abstract

Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (77)