Table of Contents
Fetching ...

Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights

Krishnakumar Balasubramanian, Nathan Ross

TL;DR

Gaussian approximation bounds in the Wasserstein-$1 norm are established between the FDDs of deep neural networks with randomly initialized weights that have finite-order moments assuming a Lipschitz activation function and allowing the layer widths to grow to infinity at arbitrary relative rates.

Abstract

We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the FDDs and their Gaussian limit assuming a Lipschitz activation function and allowing the layer widths to grow to infinity at arbitrary relative rates. In the special case where all widths are proportional to a common scale parameter $n$ and there are $L-1$ hidden layers, we obtain convergence rates of order $n^{-({1}/{6})^{L-1} + ε}$, for any $ε> 0$.

Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights

TL;DR

Gaussian approximation bounds in the Wasserstein-$1 norm are established between the FDDs of deep neural networks with randomly initialized weights that have finite-order moments assuming a Lipschitz activation function and allowing the layer widths to grow to infinity at arbitrary relative rates.

Abstract

We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein- norm between the FDDs and their Gaussian limit assuming a Lipschitz activation function and allowing the layer widths to grow to infinity at arbitrary relative rates. In the special case where all widths are proportional to a common scale parameter and there are hidden layers, we obtain convergence rates of order , for any .

Paper Structure

This paper contains 6 sections, 11 theorems, 85 equations, 1 table.

Key Result

Theorem 1.1

Let $F^{(L)}$ be the DNN defined at eq:fells with centered weights $W_{i,j}^{(\ell)}$ satisfying eq:wellvar which are independent across $i,j,\ell$ with identically distributed rows: $(W_{i,k}^{(\ell)})_{k=1}^{n_\ell}\stackrel{d}{=}(W_{j,k}^{(\ell)})_{k=1}^{n_\ell}$, and with a Lipschitz activation where $G^{(L)}$ is the Gaussian process defined by the covariance recursion eq:gpcov.

Theorems & Definitions (25)

  • Theorem 1.1
  • Remark 1.2
  • Remark 1.3
  • Remark 1.4
  • Remark 1.5
  • Remark 1.6
  • Lemma 2.1
  • proof : Proof of Lemma \ref{['lem:fddapprox']}
  • Corollary 2.2
  • Remark 2.3
  • ...and 15 more