Quantitative CLTs in Deep Neural Networks

Stefano Favaro; Boris Hanin; Domenico Marinucci; Ivan Nourdin; Giovanni Peccati

Quantitative CLTs in Deep Neural Networks

Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati

TL;DR

This work analyzes fully connected neural networks with Gaussian initial weights and biases, where hidden widths scale with a large parameter $n$, and derives quantitative central limit theorems comparing finite-width networks to the infinite-width Gaussian process limit. The authors establish one-dimensional, finite-dimensional, and functional CLTs using Stein's method, conditional Gaussian representations, and novel coupling techniques, obtaining rates that scale as powers of $n$ (e.g., $n^{-1/2}$ in 1D, $n^{-1/2}$ or $n^{-1/8}$ in higher-dimensional/functional settings). Key contributions include new conditional-Gaussian bounds, convex-distance results for possibly degenerate covariances, and Sobolev-space formulations that enable functional CLTs with explicit width dependence. These results strengthen the theoretical understanding of the finite-width effects in neural networks and provide sharp, width-dependent bounds that improve upon prior work, with implications for initialization stability and feature-learning regimes beyond NTK. The paper also develops a suite of methodological tools (Stein-based bounds, coupling arguments, and operator-perturbation inequalities) that are likely to influence future probabilistic analyses of random neural networks.

Abstract

We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-γ}$ for $γ>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.

Quantitative CLTs in Deep Neural Networks

TL;DR

This work analyzes fully connected neural networks with Gaussian initial weights and biases, where hidden widths scale with a large parameter

, and derives quantitative central limit theorems comparing finite-width networks to the infinite-width Gaussian process limit. The authors establish one-dimensional, finite-dimensional, and functional CLTs using Stein's method, conditional Gaussian representations, and novel coupling techniques, obtaining rates that scale as powers of

(e.g.,

in 1D,

in higher-dimensional/functional settings). Key contributions include new conditional-Gaussian bounds, convex-distance results for possibly degenerate covariances, and Sobolev-space formulations that enable functional CLTs with explicit width dependence. These results strengthen the theoretical understanding of the finite-width effects in neural networks and provide sharp, width-dependent bounds that improve upon prior work, with implications for initialization stability and feature-learning regimes beyond NTK. The paper also develops a suite of methodological tools (Stein-based bounds, coupling arguments, and operator-perturbation inequalities) that are likely to influence future probabilistic analyses of random neural networks.

Abstract

We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant

. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite

and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like

for

, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.

Paper Structure (28 sections, 17 theorems, 101 equations)

This paper contains 28 sections, 17 theorems, 101 equations.

Introduction
Informal Overview of Results
Outline for Remainder of Article
Assumptions and Definitions
Main Results
Notation and Setting for Main Results
One-dimensional bounds
Finite-dimensional bounds
Functional Bounds
Random fields as random elements
Bounds in Sobolev spaces
Embedding of smooth non-linearities
Related Work
Preparatory results
Variance estimates
...and 13 more sections

Key Result

Theorem 1.3

Fix $L, n_0, n_{L+1},r\geq 1$ and a non-linearity $\sigma:\mathbb{R}\rightarrow \mathbb{R}$ that is polynomially bounded to order $r$ in the sense of the forthcoming formula eq:sigma-regg. As $n_1,\ldots n_L\rightarrow \infty$, the random field $x_\alpha\in \mathbb{R}^{n_0}\mapsto z_\alpha^{(L+1)}\i satisfying where for any $f:\mathbb{R}^2\rightarrow \mathbb{R}$ we've written $\left\langle f(z_{i

Theorems & Definitions (46)

Definition 1.1: Fully Connected Network
Definition 1.2: Random Fully Connected Neural Network
Theorem 1.3: Infinite Networks as Gaussian Processes -- neal1996Lee2018Matt2018Yang2020Bracale2021hanin2021random)
Remark 1.4
Definition 2.1: Polynomially Bounded Activations
Remark 2.2
Remark 2.3
Definition 2.4
Lemma 2.5
Remark 2.6
...and 36 more

Quantitative CLTs in Deep Neural Networks

TL;DR

Abstract

Quantitative CLTs in Deep Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (46)