FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Tristan Cinquin; Marvin Pförtner; Vincent Fortuin; Philipp Hennig; Robert Bamler

FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Tristan Cinquin, Marvin Pförtner, Vincent Fortuin, Philipp Hennig, Robert Bamler

TL;DR

Since Lebesgue densities do not exist on infinite-dimensional function spaces, this work recast training as finding the so-called weak mode of the posterior measure under a Gaussian process (GP) prior restricted to the space of functions representable by the neural network.

Abstract

Laplace approximations are popular techniques for endowing deep networks with epistemic uncertainty estimates as they can be applied without altering the predictions of the trained network, and they scale to large models and datasets. While the choice of prior strongly affects the resulting posterior distribution, computational tractability and lack of interpretability of the weight space typically limit the Laplace approximation to isotropic Gaussian priors, which are known to cause pathological behavior as depth increases. As a remedy, we directly place a prior on function space. More precisely, since Lebesgue densities do not exist on infinite-dimensional function spaces, we recast training as finding the so-called weak mode of the posterior measure under a Gaussian process (GP) prior restricted to the space of functions representable by the neural network. Through the GP prior, one can express structured and interpretable inductive biases, such as regularity or periodicity, directly in function space, while still exploiting the implicit inductive biases that allow deep networks to generalize. After model linearization, the training objective induces a negative log-posterior density to which we apply a Laplace approximation, leveraging highly scalable methods from matrix-free linear algebra. Our method provides improved results where prior knowledge is abundant (as is the case in many scientific inference tasks). At the same time, it stays competitive for black-box supervised learning problems, where neural networks typically excel.

FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

TL;DR

Abstract

Paper Structure (55 sections, 6 theorems, 23 equations, 14 figures, 4 tables, 2 algorithms)

This paper contains 55 sections, 6 theorems, 23 equations, 14 figures, 4 tables, 2 algorithms.

Introduction
The need for function-space priors in BNNs.
Preliminaries: Laplace approximation in weight space
The linearized Laplace approximation.
FSP-Laplace: Laplace approximation under function-space priors
Laplace approximations in function space
MAP estimation in neural networks under Gaussian process priors.
The FSP-Laplace objective as an unnormalized log-density.
Algorithmic Considerations
Training with the FSP-Laplace objective function.
Efficient linearized Laplace approximations of the FSP-Laplace objective.
Choice of context points.
Experiments
Baselines.
Qualitative evaluation on synthetic data.
...and 40 more sections

Key Result

proposition 0

Let asm:gpasm:potentialasm:dnnfns-rkhs-closed-compact hold. For $\lambda > 0$, define $\Phi^{{\bm{Y}}, \lambda} \colon {\mathbb{B}}\to {\mathbb{R}}, {\bm{f}} \mapsto \Phi^{{\bm{Y}}}({\bm{f}}) + \frac{1}{2 \lambda^2} d_{{\mathbb{B}}}^2({\bm{f}}, {\mathbb{F}}).$ Then the posterior measure $\mathrm{P}_

Figures (14)

Figure 1: FSP-Laplace allows for efficient approximate Bayesian neural network (BNN) inference under interpretable function space priors. Using our method, it is possible to encode functional properties like smoothness, lengthscale, or periodicity through a Gaussian process (GP) prior. The gray data points in the plots are noisy observations of a periodic function.
Figure 2: Results for the ocean current modeling experiment. We report the mean velocity vectors, the norm of their standard-deviation and the squared errors of compared methods. Unlike the Laplace, we find that FSP-Laplace accurately captures ocean current dynamics.
Figure 3: Results using our method (FSP-Laplace) as a surrogate model for Bayesian optimization. We find that FSP-Laplace performs particularly well on lower-dimensional problems, where it converges more quickly and to higher rewards than the Laplace, obtaining comparable scores as the Gaussian process (GP).
Figure C.1: Just like the Gaussian process (GP) and sparse GP, FSP-Laplace captures the smoothness behavior specified by the RBF covariance function of the Gaussian process prior.
Figure C.2: Unlike the linearized Laplace, FSP-Laplace allows to incorporate periodicity within the support of the data using a periodic prior covariance function and without additional periodic features.
...and 9 more figures

Theorems & Definitions (11)

definition 1: Weak Mode Lambley2023StrongMAP
proposition 0
proposition 0
lemma A.1
proof
lemma A.2
proof
proposition A.2
proof
proposition A.2
...and 1 more

FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

TL;DR

Abstract

FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (11)