Large Deviations of Gaussian Neural Networks with ReLU activation
Quirin Vogel
TL;DR
The work proves a large deviation principle for deep Gaussian neural networks with Gaussian weights and linearly growing activations, notably ReLU, extending prior results that required bounded activations. It shifts from the Gärtner–Ellis framework to exponential equivalence and the multidimensional Cramér theorem to handle non-finite moment generating functions, and it delivers a simplified rate function together with a ReLU-specific power-series expansion for practical minimization in high dimensions. The results rely on precise Laplace-transform analysis, an inductive LDP across layers, and a refined rate-function characterization via Moore–Penrose inverses, offering a rigorous asymptotic description of tail behavior in wide networks. The findings have potential implications for uncertainty quantification and tail risk assessment in deep Gaussian models and provide computationally tractable tools for ReLU-based architectures through explicit series expansions.
Abstract
We prove a large deviation principle for deep neural networks with Gaussian weights and at most linearly growing activation functions, such as ReLU. This generalises earlier work, in which bounded and continuous activation functions were considered. In practice, linearly growing activation functions such as ReLU are most commonly used. We furthermore simplify previous expressions for the rate function and provide a power-series expansions for the ReLU case.
