Table of Contents
Fetching ...

Optimal Subspace Inference for the Laplace Approximation of Bayesian Neural Networks

Josua Faller, Jörg Martin

TL;DR

Addresses tractable uncertainty quantification for Bayesian neural networks by deriving an optimal affine subspace inference within the Laplace approximation and providing a practical construction workflow. The key theoretical result shows an optimal subspace that can recover the full epistemic covariance in a rank-s form, enabling reliable uncertainty estimates with far fewer parameters. Empirically, low-rank subspace constructions, especially using KFAC-based approximations, outperform subset-based methods across regression and classification, and a trace-based criterion effectively ranks subspaces. The work offers a scalable, principled path to subspace-based Bayes posteriors in large networks, with limitations tied to the quality of the posterior-covariance approximations and storage of projection matrices.

Abstract

Subspace inference for neural networks assumes that a subspace of their parameter space suffices to produce a reliable uncertainty quantification. In this work, we mathematically derive the optimal subspace model to a Bayesian inference scenario based on the Laplace approximation. We demonstrate empirically that, in the optimal case, often a fraction of parameters less than 1% is sufficient to obtain a reliable estimate of the full Laplace approximation. Since the optimal solution is derived, we can evaluate all other subspace models against a baseline. In addition, we give an approximation of our method that is applicable to larger problem settings, in which the optimal solution is not computable, and compare it to existing subspace models from the literature. In general, our approximation scheme outperforms previous work. Furthermore, we present a metric to qualitatively compare different subspace models even if the exact Laplace approximation is unknown.

Optimal Subspace Inference for the Laplace Approximation of Bayesian Neural Networks

TL;DR

Addresses tractable uncertainty quantification for Bayesian neural networks by deriving an optimal affine subspace inference within the Laplace approximation and providing a practical construction workflow. The key theoretical result shows an optimal subspace that can recover the full epistemic covariance in a rank-s form, enabling reliable uncertainty estimates with far fewer parameters. Empirically, low-rank subspace constructions, especially using KFAC-based approximations, outperform subset-based methods across regression and classification, and a trace-based criterion effectively ranks subspaces. The work offers a scalable, principled path to subspace-based Bayes posteriors in large networks, with limitations tied to the quality of the posterior-covariance approximations and storage of projection matrices.

Abstract

Subspace inference for neural networks assumes that a subspace of their parameter space suffices to produce a reliable uncertainty quantification. In this work, we mathematically derive the optimal subspace model to a Bayesian inference scenario based on the Laplace approximation. We demonstrate empirically that, in the optimal case, often a fraction of parameters less than 1% is sufficient to obtain a reliable estimate of the full Laplace approximation. Since the optimal solution is derived, we can evaluate all other subspace models against a baseline. In addition, we give an approximation of our method that is applicable to larger problem settings, in which the optimal solution is not computable, and compare it to existing subspace models from the literature. In general, our approximation scheme outperforms previous work. Furthermore, we present a metric to qualitatively compare different subspace models even if the exact Laplace approximation is unknown.

Paper Structure

This paper contains 25 sections, 4 theorems, 53 equations, 8 figures, 6 tables.

Key Result

Lemma 1

In the setting above, consider a full rank $P\in \mathbb{R}^{p\times s}$. For the posterior $\tilde{p}(\mu|\mathcal{D})\propto \tilde{p}(\mu) \tilde{p}(\mathcal{D}|\mu)$ with prior $\tilde{p}(\mu)$ as in eq:mu_prior we have the LA

Figures (8)

  • Figure 1: Comparison of low rank approximations and subset methods for different regression datasets. Different choices of $P$ are marked by different colours and line types. The first row displays the relative error \ref{['eq:RelativeError']} and the second the logarithm of the trace \ref{['eq:TraceOrderMain']} of the epistemic covariance matrix. Missing values in the logarithm of trace plots have a trace of zero at these values of $s$ (e.g. SWAG for the lowest $s$ in Red Wine.)
  • Figure 2: Relative error \ref{['eq:RelativeError']} and logarithm of trace \ref{['eq:TraceOrderMain']} of the epistemic covariance matrix for MNIST and FashionMNIST.
  • Figure 3: Relative error \ref{['eq:RelativeError']} (left) and trace criterion \ref{['eq:TraceOrderMain']} (right) for corrupted MNIST datasets Mu2019mnistc and three different dimensions $s=100,\, 500,\, 1000$ (shown by markers in increasing size). Different choices for $P$ are indicated by different colours and marker shapes: Square markers $\blacksquare$ indicate subset based methods, whereas discs $\bullet$ indicate low-rank based methods (proposed in this work). The colour coding is chosen as in Figure \ref{['fig:regression_plots']}. Note there are two $P$s constructed from a diagonal approximation to the Hessian that either use a subset (pink squares) or a low rank based (pink circles) approach. Results were obtained by averaging over five seeds. Standard errors are depicted by bars, where the latter are larger than the marker size.
  • Figure 4: Evaluation with the trace criterion \ref{['eq:TraceOrderMain']} for CIFAR10 and ImageNet10 and different choices of $P$. Missing values in Figure \ref{['fig:imagenet10_resnet18']} are due to vanishing trace values.
  • Figure 5: Relative error \ref{['eq:RelativeError']} of the epistemic covariance matrix of the studied subset methods for $s$ up to the number of parameters $p$ for MNIST.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Lemma 1
  • Theorem 1: Existence of an optimal subspace model for the Laplace approximation
  • Theorem : Existence of an optimal subspace model for the Laplace approximation
  • proof
  • Lemma 2
  • proof