Table of Contents
Fetching ...

Statistical Properties of Deep Neural Networks with Dependent Data

Chad Brown

TL;DR

This work develops a general sieve-based statistical framework to analyze deep neural network estimators under dependent data, establishing convergence-rate-in-probability results without requiring stationarity and nonasymptotic $L^2$-error bounds under stationary β-mixing. It then specializes these results to fully connected, depth- and width-growing DNNs with continuous piecewise-linear activations under Hölder smoothness, deriving rates that scale as $\epsilon_n \asymp n^{-p/(p+d)}$ up to poly-log factors, and extending to logistic binomial autoregressions. The analysis accommodates unbounded weights and relies on pseudo-dimension and entropy-with-bracketing concepts to control network complexity, enabling finite-sample guarantees for both nonparametric regression and classification. Extensions to alternative activations (e.g., Leaky ReLU) and architectures are provided through approximation results and complexity bounds, broadening applicability to time-series settings and potentially informing inference in models with first-stage DNN components.

Abstract

This paper establishes statistical properties of deep neural network (DNN) estimators under dependent data. Two general results for nonparametric sieve estimators directly applicable to DNN estimators are given. The first establishes rates for convergence in probability under nonstationary data. The second provides non-asymptotic probability bounds on $\mathcal{L}^{2}$-errors under stationary $β$-mixing data. I apply these results to DNN estimators in both regression and classification contexts imposing only a standard Hölder smoothness assumption. The DNN architectures considered are common in applications, featuring fully connected feedforward networks with any continuous piecewise linear activation function, unbounded weights, and a width and depth that grows with sample size. The framework provided also offers potential for research into other DNN architectures and time-series applications.

Statistical Properties of Deep Neural Networks with Dependent Data

TL;DR

This work develops a general sieve-based statistical framework to analyze deep neural network estimators under dependent data, establishing convergence-rate-in-probability results without requiring stationarity and nonasymptotic -error bounds under stationary β-mixing. It then specializes these results to fully connected, depth- and width-growing DNNs with continuous piecewise-linear activations under Hölder smoothness, deriving rates that scale as up to poly-log factors, and extending to logistic binomial autoregressions. The analysis accommodates unbounded weights and relies on pseudo-dimension and entropy-with-bracketing concepts to control network complexity, enabling finite-sample guarantees for both nonparametric regression and classification. Extensions to alternative activations (e.g., Leaky ReLU) and architectures are provided through approximation results and complexity bounds, broadening applicability to time-series settings and potentially informing inference in models with first-stage DNN components.

Abstract

This paper establishes statistical properties of deep neural network (DNN) estimators under dependent data. Two general results for nonparametric sieve estimators directly applicable to DNN estimators are given. The first establishes rates for convergence in probability under nonstationary data. The second provides non-asymptotic probability bounds on -errors under stationary -mixing data. I apply these results to DNN estimators in both regression and classification contexts imposing only a standard Hölder smoothness assumption. The DNN architectures considered are common in applications, featuring fully connected feedforward networks with any continuous piecewise linear activation function, unbounded weights, and a width and depth that grows with sample size. The framework provided also offers potential for research into other DNN architectures and time-series applications.

Paper Structure

This paper contains 32 sections, 28 theorems, 255 equations, 2 figures.

Key Result

Proposition 1

Let $\mathcal{G}$ be a pointwise-separable class of functions. Then, for any $n\in\mathbb{N}$ and $U_n:\Omega\times \mathbb{R}^n \to \mathbb{R}$ that is measurable-$(\mathcal{A}\otimes\mathcal{B}(\mathbb{R}^n))/\mathcal{B}(\mathbb{R})$, the mappings from $\Omega$ to $\overline{\mathbb{R}}$, are measurable-$\mathcal{A}/\mathcal{B}(\overline{\mathbb{R}})$.

Figures (2)

  • Figure 1: Example of $\mathcal{N}_{n}$ architecture graph structure where $L_n=2$, $H_{n,1} =3$, $H_{n,2}=2$, $W_n=20$, and $d=2$.
  • Figure 2: Example of $\mathcal{N}_{n,\varphi}^{\mathrm{FFN}}$ architecture graph structure where $L_n=2$, $W_n=17$, and $d=2$.

Theorems & Definitions (39)

  • Example 1
  • Example 2
  • Definition 1
  • Proposition 1
  • Proposition 2
  • Definition 2
  • Theorem 1
  • Definition 3
  • Corollary 1: chen_geometric_2000
  • Definition 4
  • ...and 29 more