Large deviation principles for convolutional Bayesian neural networks

Federico Bassetti; Vassili De Palma; Lucia Ladelli

Large deviation principles for convolutional Bayesian neural networks

Federico Bassetti, Vassili De Palma, Lucia Ladelli

TL;DR

This work establishes a large deviation principle (LDP) for the sequence of conditional covariance matrices under Gaussian prior distribution on the weights and provides a streamlined proof of the concentration of the conditional covariances and the Gaussian equivalence of the network.

Abstract

While suitably scaled CNNs with Gaussian initialization are known to converge to Gaussian processes as the number of channels diverges, little is known beyond this Gaussian limit. We establish a large deviation principle (LDP) for convolutional neural networks in the infinite-channel regime. We consider a broad class of multidimensional CNN architectures characterized by general receptive fields encoded through a patch-extractor function satisfying mild structural assumptions. Our main result establishes a large deviation principle (LDP) for the sequence of conditional covariance matrices under Gaussian prior distribution on the weights. We further derive an LDP for the posterior distribution obtained by conditioning on a finite number of observations. In addition, we provide a streamlined proof of the concentration of the conditional covariances and of the Gaussian equivalence of the network. To the best of our knowledge, this is the first large deviation principle established for convolutional neural networks.

Large deviation principles for convolutional Bayesian neural networks

TL;DR

Abstract

Paper Structure (19 sections, 22 theorems, 143 equations, 1 figure)

This paper contains 19 sections, 22 theorems, 143 equations, 1 figure.

Introduction
Setting of the problem
Convolutional Neural Networks: definition
Examples
Conditional Gaussian structure
Main results
Covariance concentration and asymptotic normality
Large deviation for the covariance tensor
LDP under the posterior distribution
LDP for the rescaled network
A general covariance structure
Proof of Theorem \ref{['prop:cnn_lln']}
Law of large numbers
Conclusion of the proof
Proof of Theorem \ref{['prop:gs_ldp0']}
...and 4 more sections

Key Result

Proposition 2.1

Under Assumption A0, for every $\ell = 0,\ldots,L$, the collection of random variables $[h_{c,i}^{(\ell+1)}(\mathbf{x}_\mu): c=1,\dots,C_{\ell+1},i=1,\dots,N_{\ell+1},\mu=1,\dots,P]$, conditionally on $\mathcal{F}^{\ell}$, are jointly normal with zero means and covariances

Figures (1)

Figure 1: An example of 2 layers ($\ell=0,\ell=1$) in a 2D convolutional network. Both layers have two channels ($1=$ blue, $2=$ red). Here $N_0=N_1=9$ and $R^{(i,0)}$ is defined in \ref{['R0exzeropadd']}. Yellow dots represent the receptive fields of pixel $(i_1,i_2)$ with cardinality $M_0=9$. Zero padding location are represented by the dotted cells.

Theorems & Definitions (40)

Remark 1: Tensor notation
Proposition 2.1
Remark 2
Remark 3
Theorem 3.1: Covariance concentration in CNNs
Theorem 3.2: Gaussian limit in CNNs
Theorem 3.3: LDP in CNNs
Lemma 3.4
Proposition 3.5: Posterior LDP in CNNs
Proposition 3.6: LDP for the network output
...and 30 more

Large deviation principles for convolutional Bayesian neural networks

TL;DR

Abstract

Large deviation principles for convolutional Bayesian neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (40)