Table of Contents
Fetching ...

Quantitative CLTs in Deep Neural Networks

Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati

TL;DR

This work analyzes fully connected neural networks with Gaussian initial weights and biases, where hidden widths scale with a large parameter $n$, and derives quantitative central limit theorems comparing finite-width networks to the infinite-width Gaussian process limit. The authors establish one-dimensional, finite-dimensional, and functional CLTs using Stein's method, conditional Gaussian representations, and novel coupling techniques, obtaining rates that scale as powers of $n$ (e.g., $n^{-1/2}$ in 1D, $n^{-1/2}$ or $n^{-1/8}$ in higher-dimensional/functional settings). Key contributions include new conditional-Gaussian bounds, convex-distance results for possibly degenerate covariances, and Sobolev-space formulations that enable functional CLTs with explicit width dependence. These results strengthen the theoretical understanding of the finite-width effects in neural networks and provide sharp, width-dependent bounds that improve upon prior work, with implications for initialization stability and feature-learning regimes beyond NTK. The paper also develops a suite of methodological tools (Stein-based bounds, coupling arguments, and operator-perturbation inequalities) that are likely to influence future probabilistic analyses of random neural networks.

Abstract

We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-γ}$ for $γ>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.

Quantitative CLTs in Deep Neural Networks

TL;DR

This work analyzes fully connected neural networks with Gaussian initial weights and biases, where hidden widths scale with a large parameter , and derives quantitative central limit theorems comparing finite-width networks to the infinite-width Gaussian process limit. The authors establish one-dimensional, finite-dimensional, and functional CLTs using Stein's method, conditional Gaussian representations, and novel coupling techniques, obtaining rates that scale as powers of (e.g., in 1D, or in higher-dimensional/functional settings). Key contributions include new conditional-Gaussian bounds, convex-distance results for possibly degenerate covariances, and Sobolev-space formulations that enable functional CLTs with explicit width dependence. These results strengthen the theoretical understanding of the finite-width effects in neural networks and provide sharp, width-dependent bounds that improve upon prior work, with implications for initialization stability and feature-learning regimes beyond NTK. The paper also develops a suite of methodological tools (Stein-based bounds, coupling arguments, and operator-perturbation inequalities) that are likely to influence future probabilistic analyses of random neural networks.

Abstract

We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant . Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like for , with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
Paper Structure (28 sections, 17 theorems, 101 equations)

This paper contains 28 sections, 17 theorems, 101 equations.

Key Result

Theorem 1.3

Fix $L, n_0, n_{L+1},r\geq 1$ and a non-linearity $\sigma:\mathbb{R}\rightarrow \mathbb{R}$ that is polynomially bounded to order $r$ in the sense of the forthcoming formula eq:sigma-regg. As $n_1,\ldots n_L\rightarrow \infty$, the random field $x_\alpha\in \mathbb{R}^{n_0}\mapsto z_\alpha^{(L+1)}\i satisfying where for any $f:\mathbb{R}^2\rightarrow \mathbb{R}$ we've written $\left\langle f(z_{i

Theorems & Definitions (46)

  • Definition 1.1: Fully Connected Network
  • Definition 1.2: Random Fully Connected Neural Network
  • Theorem 1.3: Infinite Networks as Gaussian Processes -- neal1996Lee2018Matt2018Yang2020Bracale2021hanin2021random)
  • Remark 1.4
  • Definition 2.1: Polynomially Bounded Activations
  • Remark 2.2
  • Remark 2.3
  • Definition 2.4
  • Lemma 2.5
  • Remark 2.6
  • ...and 36 more