Large and moderate deviations for Gaussian neural networks
Claudio Macci, Barbara Pacchiarotti, Giovanni Luca Torrisi
TL;DR
The paper analyzes large deviations and moderate deviations for the output of Gaussian fully connected neural networks as layer widths grow, covering deep architectures with bounded and continuous pre-activation functions, and extending to single-input ReLU networks as well as shallow networks with general activations. A key tool is a recursive representation that expresses the network output as a centered Gaussian with a random covariance, enabling an inductive large-deviation analysis across layers. The main results provide a large-deviation principle with speed $v_n^*$ and a moderate-deviation principle with speed $1/a_n$, with rate functions expressed via a covariance recursion $\widehat{g}_{\underline{x}}^{(\ell)}$ and the Fenchel–Legendre transform $\kappa^*(\cdot;q)$; explicit forms are available when invertibility holds, and special treatments are given for ReLU with a single input. The findings quantify the exponential-scale probabilities of rare network-output events (and sensitivities in shallow/derivative cases), offering theoretical insight into the initialization-driven stochastic behavior of Gaussian nets and potential implications for training dynamics.
Abstract
We prove large and moderate deviations for the output of Gaussian fully connected neural networks. The main achievements concern deep neural networks (i.e., when the model has more than one hidden layer) and hold for bounded and continuous pre-activation functions. However, for deep neural networks fed by a single input, we have results even if the pre-activation is ReLU. When the network is shallow (i.e., there is exactly one hidden layer) the large and moderate principles hold for quite general pre-activation functions.
