Table of Contents
Fetching ...

A Gaussian Process perspective on Convolutional Neural Networks

Anastasia Borovykh

TL;DR

The paper reframes convolutional neural networks within a Gaussian-process framework to uncover when CNN outputs behave like GP priors, despite the non-iid, compositionally dependent sums characteristic of convolutional layers. It leverages a Bentkus Lyapunov-type bound to justify GP-like behavior in the first layer and derives a recursive, convolutional kernel that evolves with depth, linking CNNs to additive/convolutional GP kernels. Numerical experiments using MMD show that, for moderate filter sizes and common activations, CNN priors quickly resemble GP priors, and CNN posteriors under input conditioning align with GP posteriors for time-series data. This work provides a principled Bayesian lens for CNNs, enabling analytic uncertainty via GP machinery and clarifying how the convolutional structure governs GP convergence and kernel formation.

Abstract

In this paper we cast the well-known convolutional neural network in a Gaussian process perspective. In this way we hope to gain additional insights into the performance of convolutional networks, in particular understand under what circumstances they tend to perform well and what assumptions are implicitly made in the network. While for fully-connected networks the properties of convergence to Gaussian processes have been studied extensively, little is known about situations in which the output from a convolutional network approaches a multivariate normal distribution.

A Gaussian Process perspective on Convolutional Neural Networks

TL;DR

The paper reframes convolutional neural networks within a Gaussian-process framework to uncover when CNN outputs behave like GP priors, despite the non-iid, compositionally dependent sums characteristic of convolutional layers. It leverages a Bentkus Lyapunov-type bound to justify GP-like behavior in the first layer and derives a recursive, convolutional kernel that evolves with depth, linking CNNs to additive/convolutional GP kernels. Numerical experiments using MMD show that, for moderate filter sizes and common activations, CNN priors quickly resemble GP priors, and CNN posteriors under input conditioning align with GP posteriors for time-series data. This work provides a principled Bayesian lens for CNNs, enabling analytic uncertainty via GP machinery and clarifying how the convolutional structure governs GP convergence and kernel formation.

Abstract

In this paper we cast the well-known convolutional neural network in a Gaussian process perspective. In this way we hope to gain additional insights into the performance of convolutional networks, in particular understand under what circumstances they tend to perform well and what assumptions are implicitly made in the network. While for fully-connected networks the properties of convergence to Gaussian processes have been studied extensively, little is known about situations in which the output from a convolutional network approaches a multivariate normal distribution.

Paper Structure

This paper contains 16 sections, 2 theorems, 17 equations, 5 figures.

Key Result

Theorem 1

Let $X_1,\cdots,X_M$ be independent random vectors taking values in $\mathbb{R}^d$ such that $\mathbb{E}[X_i]=0$ for all $i$. Let $S=X_1+\cdots+X_M$. Assume that the covariance operator $\Sigma^2$ of $S$ is invertible. Let $Z\sim\mathcal{N}(0,\Sigma^2)$, a centered Gaussian with covariance matrix $\

Figures (5)

  • Figure 1: A configuration of a two-layer fully-connected network (left) and a convolutional network with a filter size of two, the weights are shared across an input layer as indicated by the similar colors (right).
  • Figure 2: The angular structure of the convolutional kernel as a function of $\theta_1$ with $\theta_2=0.5$ (left) and $\theta_2=3$ (right) and its evolution with depth for $\sigma_w^2 = \frac{1.6}{M}$. The red line shows the fully-connected kernel (without averaging).
  • Figure 3: MMD between the GP and CNN with a linear activation (top left), with a ReLU activation using \ref{['eq:kernelgp']} (top right) and a ReLU (bottom left) and hyperbolic tangent (bottom right) using the MC evaluation of \ref{['eq:fingp']} for not identically distributed inputs.
  • Figure 4: MMD between the GP and CNN with linear (top left), with ReLU (top right), hyperbolic tangents (bottom) for an AR(2) series $x_i = \phi_1x_{i-1}+\phi_2x_{i-2}$ with coefficients $\phi_1=-0.6$ and $\phi_2=0.2$ and $d=50$.
  • Figure 5: A comparison between the trained CNN mean (left) and posterior inference in the corresponding Gaussian process (right) with credible confidence intervals. The CNN has one hidden layer and a filterwidth of three.

Theorems & Definitions (2)

  • Theorem 1: A Lyapunov-type bound in $\mathbb{R}^d$
  • Theorem 2: Independent and identically distributed case