Robust deep learning from weakly dependent data

William Kengne; Modou Wade

Robust deep learning from weakly dependent data

William Kengne, Modou Wade

TL;DR

This work extends theory for deep neural networks to robust learning from weakly dependent data by allowing unbounded losses and inputs, assuming only a finite $r$-th moment for $Y$ with $r>1$. It derives non-asymptotic excess-risk bounds for ERM-trained DNNs under strong mixing or $\\psi$-weak dependence, with convergence rates linked to the tail parameter $r$ and the Hölder smoothness $s$ of the target, achieving near-i.i.d. rates when $r=\\infty$. For targets in $\\mathcal{C}^{s,\\mathcal{K}}(\\mathcal{X})$, the rate is $\\mathcal O((\\log n^{(\\alpha)})^{3} (n^{(\\alpha)})^{-s/(s+d)(1-1/r)})$, and in i.i.d. cases with $r=\\infty$ it approaches $\\mathcal O(n^{-s/(s+d)} (\\log n)^3)$. The paper applies these results to robust nonparametric regression and autoregression using $L_1$ and Huber losses, demonstrates robustness to heavy-tailed noise (e.g., $t(2)$, Cauchy), and provides simulations showing superior performance over least-squares in such settings. These findings broaden the applicability of DNN theory to dependent, heavy-tailed data in regression and time-series forecasting.

Abstract

Recent developments on deep learning established some theoretical properties of deep neural networks estimators. However, most of the existing works on this topic are restricted to bounded loss functions or (sub)-Gaussian or bounded input. This paper considers robust deep learning from weakly dependent observations, with unbounded loss function and unbounded input/output. It is only assumed that the output variable has a finite $r$ order moment, with $r >1$. Non asymptotic bounds for the expected excess risk of the deep neural network estimator are established under strong mixing, and $ψ$-weak dependence assumptions on the observations. We derive a relationship between these bounds and $r$, and when the data have moments of any order (that is $r=\infty$), the convergence rate is close to some well-known results. When the target predictor belongs to the class of Hölder smooth functions with sufficiently large smoothness index, the rate of the expected excess risk for exponentially strongly mixing data is close to or as same as those for obtained with i.i.d. samples. Application to robust nonparametric regression and robust nonparametric autoregression are considered. The simulation study for models with heavy-tailed errors shows that, robust estimators with absolute loss and Huber loss function outperform the least squares method.

Robust deep learning from weakly dependent data

TL;DR

This work extends theory for deep neural networks to robust learning from weakly dependent data by allowing unbounded losses and inputs, assuming only a finite

-th moment for

with

. It derives non-asymptotic excess-risk bounds for ERM-trained DNNs under strong mixing or

-weak dependence, with convergence rates linked to the tail parameter

and the Hölder smoothness

of the target, achieving near-i.i.d. rates when

. For targets in

, the rate is

, and in i.i.d. cases with

it approaches

. The paper applies these results to robust nonparametric regression and autoregression using

and Huber losses, demonstrates robustness to heavy-tailed noise (e.g.,

, Cauchy), and provides simulations showing superior performance over least-squares in such settings. These findings broaden the applicability of DNN theory to dependent, heavy-tailed data in regression and time-series forecasting.

Abstract

order moment, with

. Non asymptotic bounds for the expected excess risk of the deep neural network estimator are established under strong mixing, and

-weak dependence assumptions on the observations. We derive a relationship between these bounds and

, and when the data have moments of any order (that is

), the convergence rate is close to some well-known results. When the target predictor belongs to the class of Hölder smooth functions with sufficiently large smoothness index, the rate of the expected excess risk for exponentially strongly mixing data is close to or as same as those for obtained with i.i.d. samples. Application to robust nonparametric regression and robust nonparametric autoregression are considered. The simulation study for models with heavy-tailed errors shows that, robust estimators with absolute loss and Huber loss function outperform the least squares method.

Paper Structure (10 sections, 2 theorems, 120 equations, 6 figures)

This paper contains 10 sections, 2 theorems, 120 equations, 6 figures.

Introduction
Notations, assumptions and feedforward neural networks
Notations and assumptions
Feedforward neural networks
Excess risk bound for the DNN estimator
Robust nonparametric regression
Numerical results
Proofs of the main results
Proof of Theorem \ref{['thm1']}
Proof of Theorem \ref{['thm2']}

Key Result

Theorem 3.1

Assume that (A1), (A2), (A4), (A5) hold and that $h^{*} \in \mathcal{C}^{s, \mathcal{K}}(\mathcal{X})$ for some $s, \mathcal{K} > 0$, where $h^{*}$ is defined in (best_pred_F). Set $L_n = (1 - \frac{1}{r} ) \dfrac{s L_0}{s + d} \log(n^{(\alpha)} )$, $N_n = N_0 (n^{(\alpha)} )^{(1 - \frac{1}{r} ) \fr for all $\nu>3$, with, where $c, \gamma, \overline{\alpha}$ are given in (coef_alpha_mixing), $\ma

Figures (6)

Figure 1: Boxplots of the empirical $L_1$, Huber and $L_2$ excess risk of the DNN predictors with $n=250, 500$ and 1000 in DGP1 with Student-t error (a) and Gaussian error (b).
Figure 2: Boxplots of the empirical $L_1$, Huber and $L_2$ excess risk of the DNN predictors with $n=250, 500$ and 1000 in DGP2 with Student-t error (a) and Gaussian error (b).
Figure 3: Boxplots of the mean absolute prediction error of the DNN predictors with $n=250, 500$ and 1000 in DGP1 with Student-t error (a) and Gaussian error (b).
Figure 4: Boxplots of the mean absolute prediction error of the DNN predictors with $n=250, 500$ and 1000 in DGP2 with Student-t error (a) and Gaussian error (b).
Figure 5: Boxplots of the root mean square prediction error of the DNN predictors with $n=250, 500$ and 1000 in DGP1 with Student-t error (a) and Gaussian error (b).
...and 1 more figures

Theorems & Definitions (6)

Definition 2.1
Definition 2.2
Theorem 3.1
Remark 3.2
Theorem 3.3
Remark 3.4

Robust deep learning from weakly dependent data

TL;DR

Abstract

Robust deep learning from weakly dependent data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (6)