Table of Contents
Fetching ...

Large and moderate deviations for Gaussian neural networks

Claudio Macci, Barbara Pacchiarotti, Giovanni Luca Torrisi

TL;DR

The paper analyzes large deviations and moderate deviations for the output of Gaussian fully connected neural networks as layer widths grow, covering deep architectures with bounded and continuous pre-activation functions, and extending to single-input ReLU networks as well as shallow networks with general activations. A key tool is a recursive representation that expresses the network output as a centered Gaussian with a random covariance, enabling an inductive large-deviation analysis across layers. The main results provide a large-deviation principle with speed $v_n^*$ and a moderate-deviation principle with speed $1/a_n$, with rate functions expressed via a covariance recursion $\widehat{g}_{\underline{x}}^{(\ell)}$ and the Fenchel–Legendre transform $\kappa^*(\cdot;q)$; explicit forms are available when invertibility holds, and special treatments are given for ReLU with a single input. The findings quantify the exponential-scale probabilities of rare network-output events (and sensitivities in shallow/derivative cases), offering theoretical insight into the initialization-driven stochastic behavior of Gaussian nets and potential implications for training dynamics.

Abstract

We prove large and moderate deviations for the output of Gaussian fully connected neural networks. The main achievements concern deep neural networks (i.e., when the model has more than one hidden layer) and hold for bounded and continuous pre-activation functions. However, for deep neural networks fed by a single input, we have results even if the pre-activation is ReLU. When the network is shallow (i.e., there is exactly one hidden layer) the large and moderate principles hold for quite general pre-activation functions.

Large and moderate deviations for Gaussian neural networks

TL;DR

The paper analyzes large deviations and moderate deviations for the output of Gaussian fully connected neural networks as layer widths grow, covering deep architectures with bounded and continuous pre-activation functions, and extending to single-input ReLU networks as well as shallow networks with general activations. A key tool is a recursive representation that expresses the network output as a centered Gaussian with a random covariance, enabling an inductive large-deviation analysis across layers. The main results provide a large-deviation principle with speed and a moderate-deviation principle with speed , with rate functions expressed via a covariance recursion and the Fenchel–Legendre transform ; explicit forms are available when invertibility holds, and special treatments are given for ReLU with a single input. The findings quantify the exponential-scale probabilities of rare network-output events (and sensitivities in shallow/derivative cases), offering theoretical insight into the initialization-driven stochastic behavior of Gaussian nets and potential implications for training dynamics.

Abstract

We prove large and moderate deviations for the output of Gaussian fully connected neural networks. The main achievements concern deep neural networks (i.e., when the model has more than one hidden layer) and hold for bounded and continuous pre-activation functions. However, for deep neural networks fed by a single input, we have results even if the pre-activation is ReLU. When the network is shallow (i.e., there is exactly one hidden layer) the large and moderate principles hold for quite general pre-activation functions.
Paper Structure (13 sections, 9 theorems, 91 equations)

This paper contains 13 sections, 9 theorems, 91 equations.

Key Result

Theorem 2.1

Assume that Conditions cond:on-sigma and cond:on-n1-nL hold. Then the sequence $\{(Z_h^{(L+1)}(x_\alpha)/\sqrt{v_n^*})_{\alpha h}\}_n$ satisfies the LDP on $\mathbb{R}^{|A|\times n_{L+1}}$, with speed $v_n^*$ and good rate function $I_{Z^{(L+1)}(\underline{x})}$ defined by where: $\|\cdot\|$ is the Euclidean norm in $\mathbb{R}^{|A|\times n_{L+1}}$, $I_{G^{(L)}(\underline{x})}$ is defined by (fo

Theorems & Definitions (21)

  • Theorem 2.1: Large deviations
  • Theorem 2.2: Moderate deviations
  • Remark 2.1
  • Remark 2.2
  • Remark 2.3
  • Definition 3.1
  • Proposition 3.1
  • Lemma 3.1
  • Lemma 3.2: Lemma 1 in GiulianoMacciPacchiarotti
  • Lemma 3.3
  • ...and 11 more