Large and moderate deviations for Gaussian neural networks

Claudio Macci; Barbara Pacchiarotti; Giovanni Luca Torrisi

Large and moderate deviations for Gaussian neural networks

Claudio Macci, Barbara Pacchiarotti, Giovanni Luca Torrisi

TL;DR

The paper analyzes large deviations and moderate deviations for the output of Gaussian fully connected neural networks as layer widths grow, covering deep architectures with bounded and continuous pre-activation functions, and extending to single-input ReLU networks as well as shallow networks with general activations. A key tool is a recursive representation that expresses the network output as a centered Gaussian with a random covariance, enabling an inductive large-deviation analysis across layers. The main results provide a large-deviation principle with speed $v_n^*$ and a moderate-deviation principle with speed $1/a_n$, with rate functions expressed via a covariance recursion $\widehat{g}_{\underline{x}}^{(\ell)}$ and the Fenchel–Legendre transform $\kappa^*(\cdot;q)$; explicit forms are available when invertibility holds, and special treatments are given for ReLU with a single input. The findings quantify the exponential-scale probabilities of rare network-output events (and sensitivities in shallow/derivative cases), offering theoretical insight into the initialization-driven stochastic behavior of Gaussian nets and potential implications for training dynamics.

Abstract

We prove large and moderate deviations for the output of Gaussian fully connected neural networks. The main achievements concern deep neural networks (i.e., when the model has more than one hidden layer) and hold for bounded and continuous pre-activation functions. However, for deep neural networks fed by a single input, we have results even if the pre-activation is ReLU. When the network is shallow (i.e., there is exactly one hidden layer) the large and moderate principles hold for quite general pre-activation functions.

Large and moderate deviations for Gaussian neural networks

TL;DR

and a moderate-deviation principle with speed

, with rate functions expressed via a covariance recursion

and the Fenchel–Legendre transform

; explicit forms are available when invertibility holds, and special treatments are given for ReLU with a single input. The findings quantify the exponential-scale probabilities of rare network-output events (and sensitivities in shallow/derivative cases), offering theoretical insight into the initialization-driven stochastic behavior of Gaussian nets and potential implications for training dynamics.

Abstract

Paper Structure (13 sections, 9 theorems, 91 equations)

This paper contains 13 sections, 9 theorems, 91 equations.

Introduction
Main results: statements and remarks
Proofs of the main results
Preliminaries on large deviations
An important representation lemma
Proof of Theorem \ref{['th:finitedimensional-LD']}
Proof of Theorem \ref{['th:finitedimensional-MD']}
Large and moderate deviations of deep neural networks with ReLU pre-activation and single input
Modifications of the proofs of Theorems \ref{['th:finitedimensional-LD']} and \ref{['th:finitedimensional-MD']}
An explicit expression of $\kappa^*(\cdot;q)$
Results for shallow neural networks and their sensitivities
Acknowledgements.
Funding.

Key Result

Theorem 2.1

Assume that Conditions cond:on-sigma and cond:on-n1-nL hold. Then the sequence $\{(Z_h^{(L+1)}(x_\alpha)/\sqrt{v_n^*})_{\alpha h}\}_n$ satisfies the LDP on $\mathbb{R}^{|A|\times n_{L+1}}$, with speed $v_n^*$ and good rate function $I_{Z^{(L+1)}(\underline{x})}$ defined by where: $\|\cdot\|$ is the Euclidean norm in $\mathbb{R}^{|A|\times n_{L+1}}$, $I_{G^{(L)}(\underline{x})}$ is defined by (fo

Theorems & Definitions (21)

Theorem 2.1: Large deviations
Theorem 2.2: Moderate deviations
Remark 2.1
Remark 2.2
Remark 2.3
Definition 3.1
Proposition 3.1
Lemma 3.1
Lemma 3.2: Lemma 1 in GiulianoMacciPacchiarotti
Lemma 3.3
...and 11 more

Large and moderate deviations for Gaussian neural networks

TL;DR

Abstract

Large and moderate deviations for Gaussian neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (21)