Table of Contents
Fetching ...

Large deviation principles for convolutional Bayesian neural networks

Federico Bassetti, Vassili De Palma, Lucia Ladelli

TL;DR

This work establishes a large deviation principle (LDP) for the sequence of conditional covariance matrices under Gaussian prior distribution on the weights and provides a streamlined proof of the concentration of the conditional covariances and the Gaussian equivalence of the network.

Abstract

While suitably scaled CNNs with Gaussian initialization are known to converge to Gaussian processes as the number of channels diverges, little is known beyond this Gaussian limit. We establish a large deviation principle (LDP) for convolutional neural networks in the infinite-channel regime. We consider a broad class of multidimensional CNN architectures characterized by general receptive fields encoded through a patch-extractor function satisfying mild structural assumptions. Our main result establishes a large deviation principle (LDP) for the sequence of conditional covariance matrices under Gaussian prior distribution on the weights. We further derive an LDP for the posterior distribution obtained by conditioning on a finite number of observations. In addition, we provide a streamlined proof of the concentration of the conditional covariances and of the Gaussian equivalence of the network. To the best of our knowledge, this is the first large deviation principle established for convolutional neural networks.

Large deviation principles for convolutional Bayesian neural networks

TL;DR

This work establishes a large deviation principle (LDP) for the sequence of conditional covariance matrices under Gaussian prior distribution on the weights and provides a streamlined proof of the concentration of the conditional covariances and the Gaussian equivalence of the network.

Abstract

While suitably scaled CNNs with Gaussian initialization are known to converge to Gaussian processes as the number of channels diverges, little is known beyond this Gaussian limit. We establish a large deviation principle (LDP) for convolutional neural networks in the infinite-channel regime. We consider a broad class of multidimensional CNN architectures characterized by general receptive fields encoded through a patch-extractor function satisfying mild structural assumptions. Our main result establishes a large deviation principle (LDP) for the sequence of conditional covariance matrices under Gaussian prior distribution on the weights. We further derive an LDP for the posterior distribution obtained by conditioning on a finite number of observations. In addition, we provide a streamlined proof of the concentration of the conditional covariances and of the Gaussian equivalence of the network. To the best of our knowledge, this is the first large deviation principle established for convolutional neural networks.
Paper Structure (19 sections, 22 theorems, 143 equations, 1 figure)

This paper contains 19 sections, 22 theorems, 143 equations, 1 figure.

Key Result

Proposition 2.1

Under Assumption A0, for every $\ell = 0,\ldots,L$, the collection of random variables $[h_{c,i}^{(\ell+1)}(\mathbf{x}_\mu): c=1,\dots,C_{\ell+1},i=1,\dots,N_{\ell+1},\mu=1,\dots,P]$, conditionally on $\mathcal{F}^{\ell}$, are jointly normal with zero means and covariances

Figures (1)

  • Figure 1: An example of 2 layers ($\ell=0,\ell=1$) in a 2D convolutional network. Both layers have two channels ($1=$ blue, $2=$ red). Here $N_0=N_1=9$ and $R^{(i,0)}$ is defined in \ref{['R0exzeropadd']}. Yellow dots represent the receptive fields of pixel $(i_1,i_2)$ with cardinality $M_0=9$. Zero padding location are represented by the dotted cells.

Theorems & Definitions (40)

  • Remark 1: Tensor notation
  • Proposition 2.1
  • Remark 2
  • Remark 3
  • Theorem 3.1: Covariance concentration in CNNs
  • Theorem 3.2: Gaussian limit in CNNs
  • Theorem 3.3: LDP in CNNs
  • Lemma 3.4
  • Proposition 3.5: Posterior LDP in CNNs
  • Proposition 3.6: LDP for the network output
  • ...and 30 more