Table of Contents
Fetching ...

FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Tristan Cinquin, Marvin Pförtner, Vincent Fortuin, Philipp Hennig, Robert Bamler

TL;DR

Since Lebesgue densities do not exist on infinite-dimensional function spaces, this work recast training as finding the so-called weak mode of the posterior measure under a Gaussian process (GP) prior restricted to the space of functions representable by the neural network.

Abstract

Laplace approximations are popular techniques for endowing deep networks with epistemic uncertainty estimates as they can be applied without altering the predictions of the trained network, and they scale to large models and datasets. While the choice of prior strongly affects the resulting posterior distribution, computational tractability and lack of interpretability of the weight space typically limit the Laplace approximation to isotropic Gaussian priors, which are known to cause pathological behavior as depth increases. As a remedy, we directly place a prior on function space. More precisely, since Lebesgue densities do not exist on infinite-dimensional function spaces, we recast training as finding the so-called weak mode of the posterior measure under a Gaussian process (GP) prior restricted to the space of functions representable by the neural network. Through the GP prior, one can express structured and interpretable inductive biases, such as regularity or periodicity, directly in function space, while still exploiting the implicit inductive biases that allow deep networks to generalize. After model linearization, the training objective induces a negative log-posterior density to which we apply a Laplace approximation, leveraging highly scalable methods from matrix-free linear algebra. Our method provides improved results where prior knowledge is abundant (as is the case in many scientific inference tasks). At the same time, it stays competitive for black-box supervised learning problems, where neural networks typically excel.

FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

TL;DR

Since Lebesgue densities do not exist on infinite-dimensional function spaces, this work recast training as finding the so-called weak mode of the posterior measure under a Gaussian process (GP) prior restricted to the space of functions representable by the neural network.

Abstract

Laplace approximations are popular techniques for endowing deep networks with epistemic uncertainty estimates as they can be applied without altering the predictions of the trained network, and they scale to large models and datasets. While the choice of prior strongly affects the resulting posterior distribution, computational tractability and lack of interpretability of the weight space typically limit the Laplace approximation to isotropic Gaussian priors, which are known to cause pathological behavior as depth increases. As a remedy, we directly place a prior on function space. More precisely, since Lebesgue densities do not exist on infinite-dimensional function spaces, we recast training as finding the so-called weak mode of the posterior measure under a Gaussian process (GP) prior restricted to the space of functions representable by the neural network. Through the GP prior, one can express structured and interpretable inductive biases, such as regularity or periodicity, directly in function space, while still exploiting the implicit inductive biases that allow deep networks to generalize. After model linearization, the training objective induces a negative log-posterior density to which we apply a Laplace approximation, leveraging highly scalable methods from matrix-free linear algebra. Our method provides improved results where prior knowledge is abundant (as is the case in many scientific inference tasks). At the same time, it stays competitive for black-box supervised learning problems, where neural networks typically excel.
Paper Structure (55 sections, 6 theorems, 23 equations, 14 figures, 4 tables, 2 algorithms)

This paper contains 55 sections, 6 theorems, 23 equations, 14 figures, 4 tables, 2 algorithms.

Key Result

proposition 0

Let asm:gpasm:potentialasm:dnnfns-rkhs-closed-compact hold. For $\lambda > 0$, define $\Phi^{{\bm{Y}}, \lambda} \colon {\mathbb{B}}\to {\mathbb{R}}, {\bm{f}} \mapsto \Phi^{{\bm{Y}}}({\bm{f}}) + \frac{1}{2 \lambda^2} d_{{\mathbb{B}}}^2({\bm{f}}, {\mathbb{F}}).$ Then the posterior measure $\mathrm{P}_

Figures (14)

  • Figure 1: FSP-Laplace allows for efficient approximate Bayesian neural network (BNN) inference under interpretable function space priors. Using our method, it is possible to encode functional properties like smoothness, lengthscale, or periodicity through a Gaussian process (GP) prior. The gray data points in the plots are noisy observations of a periodic function.
  • Figure 2: Results for the ocean current modeling experiment. We report the mean velocity vectors, the norm of their standard-deviation and the squared errors of compared methods. Unlike the Laplace, we find that FSP-Laplace accurately captures ocean current dynamics.
  • Figure 3: Results using our method (FSP-Laplace) as a surrogate model for Bayesian optimization. We find that FSP-Laplace performs particularly well on lower-dimensional problems, where it converges more quickly and to higher rewards than the Laplace, obtaining comparable scores as the Gaussian process (GP).
  • Figure C.1: Just like the Gaussian process (GP) and sparse GP, FSP-Laplace captures the smoothness behavior specified by the RBF covariance function of the Gaussian process prior.
  • Figure C.2: Unlike the linearized Laplace, FSP-Laplace allows to incorporate periodicity within the support of the data using a periodic prior covariance function and without additional periodic features.
  • ...and 9 more figures

Theorems & Definitions (11)

  • definition 1: Weak Mode Lambley2023StrongMAP
  • proposition 0
  • proposition 0
  • lemma A.1
  • proof
  • lemma A.2
  • proof
  • proposition A.2
  • proof
  • proposition A.2
  • ...and 1 more