Large Deviations of Gaussian Neural Networks with ReLU activation

Quirin Vogel

Large Deviations of Gaussian Neural Networks with ReLU activation

Quirin Vogel

TL;DR

The work proves a large deviation principle for deep Gaussian neural networks with Gaussian weights and linearly growing activations, notably ReLU, extending prior results that required bounded activations. It shifts from the Gärtner–Ellis framework to exponential equivalence and the multidimensional Cramér theorem to handle non-finite moment generating functions, and it delivers a simplified rate function together with a ReLU-specific power-series expansion for practical minimization in high dimensions. The results rely on precise Laplace-transform analysis, an inductive LDP across layers, and a refined rate-function characterization via Moore–Penrose inverses, offering a rigorous asymptotic description of tail behavior in wide networks. The findings have potential implications for uncertainty quantification and tail risk assessment in deep Gaussian models and provide computationally tractable tools for ReLU-based architectures through explicit series expansions.

Abstract

We prove a large deviation principle for deep neural networks with Gaussian weights and at most linearly growing activation functions, such as ReLU. This generalises earlier work, in which bounded and continuous activation functions were considered. In practice, linearly growing activation functions such as ReLU are most commonly used. We furthermore simplify previous expressions for the rate function and provide a power-series expansions for the ReLU case.

Large Deviations of Gaussian Neural Networks with ReLU activation

TL;DR

Abstract

Paper Structure (9 sections, 9 theorems, 45 equations, 2 figures)

This paper contains 9 sections, 9 theorems, 45 equations, 2 figures.

Introduction and Results
Definition of the model
Our contributions
Results
Proof
Analysis of the Laplace transform
Inductive large deviations
Analysis of the rate function
A power-series expansion for ReLU

Key Result

Theorem 1

Fix $A$ a finite set and fix points $x_\alpha\in\mathbb{R}^{n_0}$ with $\alpha\in A$. The random vector $\left(Z^{{{{(L+1)}}}}(x_\alpha)\right)_{\alpha\in A}$ then induces the linear mapAlternatively, we can interpret $Z_{\ul{x}}$ as a random element in $\mathbb{R}^A\otimes\mathbb{R}^{n_{L+1}}$, wit where the superscript $+$ is the Moore--Prenrose inverse of a matrix, see Penrose_1955.

Figures (2)

Figure 1: A schematic representation of the network for the case $L=3$. Here, the hidden layers ($l\in\left\{1,\ldots,L\right\}$) have the same size, which does not need to be the case for the main theorems. The underlying randomness comes from the $b$'s and the $W$'s
Figure 2: Left: ReLU $\sigma(x)=x\mathbbm{1}\left\{x\ge 0\right\}$, middle:$\kappa$ for ReLU and $q=1$, right:$\kappa^*$ for ReLU and $q=1$.

Theorems & Definitions (17)

Example
Theorem 1
Lemma 1.1
Lemma 2.1
proof
Lemma 2.2
proof
Lemma 2.3
proof
Lemma 2.4
...and 7 more

Large Deviations of Gaussian Neural Networks with ReLU activation

TL;DR

Abstract

Large Deviations of Gaussian Neural Networks with ReLU activation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (17)