Table of Contents
Fetching ...

Large-width functional asymptotics for deep Gaussian neural networks

Daniele Bracale, Stefano Favaro, Sandra Fortini, Stefano Peluchetti

TL;DR

This work develops a function-space framework to study infinitely wide deep Gaussian neural networks with Gaussian weights and biases. By treating networks as stochastic processes on ${\mathbb R}^I$ and employing Levy's theorem, Daniell–Kolmogorov extension, and Kolmogorov–Chentsov arguments, the authors show that the networks converge to continuous Gaussian processes in the large-width limit, with sample-paths locally $\gamma$-Hölder for any $0<\gamma<1$ when the activation is Lipschitz. They derive explicit recursion formulas for the limiting covariance$\Sigma(l)$, establishing that fixed-unit limits are Gaussian with $\Sigma(1)_{ij}=\sigma_b^2+\sigma_\omega^2\langle x^{(i)},x^{(j)}\rangle$ and $\Sigma(l)_{ij}=\sigma_b^2+\sigma_\omega^2\int\phi(u)\phi(v)\, q^{(l-1)}(du,dv)$, $q^{(l-1)}=N_k(0,\Sigma(l-1))$. The vector of all units converges to a product Gaussian across units, enabling a rigorous weak convergence result in a stronger function-space metric. Overall, the paper strengthens the theoretical connection between infinitely wide deep networks and Gaussian processes and lays groundwork for GP-based analysis in broader neural-network architectures.

Abstract

In this paper, we consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions. Extending previous results (Matthews et al., 2018a;b; Yang, 2019) we adopt a function-space perspective, i.e. we look at neural networks as infinite-dimensional random elements on the input space $\mathbb{R}^I$. Under suitable assumptions on the activation function we show that: i) a network defines a continuous Gaussian process on the input space $\mathbb{R}^I$; ii) a network with re-scaled weights converges weakly to a continuous Gaussian process in the large-width limit; iii) the limiting Gaussian process has almost surely locally $γ$-Hölder continuous paths, for $0 < γ<1$. Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and Gaussian processes by establishing weak convergence in function-space with respect to a stronger metric.

Large-width functional asymptotics for deep Gaussian neural networks

TL;DR

This work develops a function-space framework to study infinitely wide deep Gaussian neural networks with Gaussian weights and biases. By treating networks as stochastic processes on and employing Levy's theorem, Daniell–Kolmogorov extension, and Kolmogorov–Chentsov arguments, the authors show that the networks converge to continuous Gaussian processes in the large-width limit, with sample-paths locally -Hölder for any when the activation is Lipschitz. They derive explicit recursion formulas for the limiting covariance, establishing that fixed-unit limits are Gaussian with and , . The vector of all units converges to a product Gaussian across units, enabling a rigorous weak convergence result in a stronger function-space metric. Overall, the paper strengthens the theoretical connection between infinitely wide deep networks and Gaussian processes and lays groundwork for GP-based analysis in broader neural-network architectures.

Abstract

In this paper, we consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions. Extending previous results (Matthews et al., 2018a;b; Yang, 2019) we adopt a function-space perspective, i.e. we look at neural networks as infinite-dimensional random elements on the input space . Under suitable assumptions on the activation function we show that: i) a network defines a continuous Gaussian process on the input space ; ii) a network with re-scaled weights converges weakly to a continuous Gaussian process in the large-width limit; iii) the limiting Gaussian process has almost surely locally -Hölder continuous paths, for . Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and Gaussian processes by establishing weak convergence in function-space with respect to a stronger metric.

Paper Structure

This paper contains 7 sections, 14 theorems, 127 equations.

Key Result

Proposition 1

Suppose that $f$ and $(f(n))_{n \geq 1}$ are random elements in $C({\mathbb R}^I; S)$ with $(S,d)$ Polish space. Then, $f(n)\stackrel{d}{\rightarrow}f$ if: i) $f(n)\stackrel{f_d}{\rightarrow}f$ and ii) the sequence $(f(n))_{n \geq 1}$ is uniformly tight.

Theorems & Definitions (27)

  • Definition 1
  • Definition 2
  • Definition 3: convergence in distribution
  • Proposition 1: convergence in distribution in $C({\mathbb R}^I; S)$, $(S,d)$ Polish
  • Proposition 2: continuous version and local-Hölderianity, $(S,d)$ complete
  • Proposition 3: uniform tightness in $C({\mathbb R}^I; S)$, $(S,d)$ Polish
  • Lemma 1: finite-dimensional limit
  • proof
  • Lemma 2: continuity
  • proof
  • ...and 17 more