Robust deep learning from weakly dependent data
William Kengne, Modou Wade
TL;DR
This work extends theory for deep neural networks to robust learning from weakly dependent data by allowing unbounded losses and inputs, assuming only a finite $r$-th moment for $Y$ with $r>1$. It derives non-asymptotic excess-risk bounds for ERM-trained DNNs under strong mixing or $\\psi$-weak dependence, with convergence rates linked to the tail parameter $r$ and the Hölder smoothness $s$ of the target, achieving near-i.i.d. rates when $r=\\infty$. For targets in $\\mathcal{C}^{s,\\mathcal{K}}(\\mathcal{X})$, the rate is $\\mathcal O((\\log n^{(\\alpha)})^{3} (n^{(\\alpha)})^{-s/(s+d)(1-1/r)})$, and in i.i.d. cases with $r=\\infty$ it approaches $\\mathcal O(n^{-s/(s+d)} (\\log n)^3)$. The paper applies these results to robust nonparametric regression and autoregression using $L_1$ and Huber losses, demonstrates robustness to heavy-tailed noise (e.g., $t(2)$, Cauchy), and provides simulations showing superior performance over least-squares in such settings. These findings broaden the applicability of DNN theory to dependent, heavy-tailed data in regression and time-series forecasting.
Abstract
Recent developments on deep learning established some theoretical properties of deep neural networks estimators. However, most of the existing works on this topic are restricted to bounded loss functions or (sub)-Gaussian or bounded input. This paper considers robust deep learning from weakly dependent observations, with unbounded loss function and unbounded input/output. It is only assumed that the output variable has a finite $r$ order moment, with $r >1$. Non asymptotic bounds for the expected excess risk of the deep neural network estimator are established under strong mixing, and $ψ$-weak dependence assumptions on the observations. We derive a relationship between these bounds and $r$, and when the data have moments of any order (that is $r=\infty$), the convergence rate is close to some well-known results. When the target predictor belongs to the class of Hölder smooth functions with sufficiently large smoothness index, the rate of the expected excess risk for exponentially strongly mixing data is close to or as same as those for obtained with i.i.d. samples. Application to robust nonparametric regression and robust nonparametric autoregression are considered. The simulation study for models with heavy-tailed errors shows that, robust estimators with absolute loss and Huber loss function outperform the least squares method.
