Table of Contents
Fetching ...

Partial Trace-Class Bayesian Neural Networks

Arran Carter, Torben Sell

TL;DR

The paper tackles the computational burden of uncertainty quantification in Bayesian neural networks by introducing Partial Trace-Class Bayesian Neural Networks (PaTraC BNNs), which reduce the number of Bayesian parameters while preserving informative uncertainty estimates. It leverages a trace-class prior that imposes an intrinsic ordering of network parameters and employs Hilbert-space MCMC (pCNL) for posterior inference, enabling efficient training even in high dimensions. Three architectures—Sep-PaTraC, Out-PaTraC, and Mix-PaTraC—are proposed to mix Bayesian and non-Bayesian components, with Mix-PaTraC offering the closest approximation to a full BNN at lower cost. Across synthetic and real datasets (CIFAR-10 and Abalone), PaTraC BNNs achieve competitive uncertainty quantification with substantial speedups and reduced memory usage, providing a scalable path for reliable uncertainty in deep learning applications. The work also outlines theoretical extensions and practical considerations, including environmental benefits and potential infinite-width analyses.

Abstract

Bayesian neural networks (BNNs) allow rigorous uncertainty quantification in deep learning, but often come at a prohibitive computational cost. We propose three different innovative architectures of partial trace-class Bayesian neural networks (PaTraC BNNs) that enable uncertainty quantification comparable to standard BNNs but use significantly fewer Bayesian parameters. These PaTraC BNNs have computational and statistical advantages over standard Bayesian neural networks in terms of speed and memory requirements. Our proposed methodology therefore facilitates reliable, robust, and scalable uncertainty quantification in neural networks. The three architectures build on trace-class neural network priors which induce an ordering of the neural network parameters, and are thus a natural choice in our framework. In a numerical simulation study, we verify the claimed benefits, and further illustrate the performance of our proposed methodology on a real-world dataset.

Partial Trace-Class Bayesian Neural Networks

TL;DR

The paper tackles the computational burden of uncertainty quantification in Bayesian neural networks by introducing Partial Trace-Class Bayesian Neural Networks (PaTraC BNNs), which reduce the number of Bayesian parameters while preserving informative uncertainty estimates. It leverages a trace-class prior that imposes an intrinsic ordering of network parameters and employs Hilbert-space MCMC (pCNL) for posterior inference, enabling efficient training even in high dimensions. Three architectures—Sep-PaTraC, Out-PaTraC, and Mix-PaTraC—are proposed to mix Bayesian and non-Bayesian components, with Mix-PaTraC offering the closest approximation to a full BNN at lower cost. Across synthetic and real datasets (CIFAR-10 and Abalone), PaTraC BNNs achieve competitive uncertainty quantification with substantial speedups and reduced memory usage, providing a scalable path for reliable uncertainty in deep learning applications. The work also outlines theoretical extensions and practical considerations, including environmental benefits and potential infinite-width analyses.

Abstract

Bayesian neural networks (BNNs) allow rigorous uncertainty quantification in deep learning, but often come at a prohibitive computational cost. We propose three different innovative architectures of partial trace-class Bayesian neural networks (PaTraC BNNs) that enable uncertainty quantification comparable to standard BNNs but use significantly fewer Bayesian parameters. These PaTraC BNNs have computational and statistical advantages over standard Bayesian neural networks in terms of speed and memory requirements. Our proposed methodology therefore facilitates reliable, robust, and scalable uncertainty quantification in neural networks. The three architectures build on trace-class neural network priors which induce an ordering of the neural network parameters, and are thus a natural choice in our framework. In a numerical simulation study, we verify the claimed benefits, and further illustrate the performance of our proposed methodology on a real-world dataset.

Paper Structure

This paper contains 20 sections, 15 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: An illustration of the three different PaTraC BNN structures; green lines denote Bayesian weights, green nodes have associated Bayesian biases, black lines denote optimised weights and black nodes have associated optimised biases. The two red lines at the output in Subfigure \ref{['fig:sep-PaTraC BNN_diagram']} denote a fixed weight of $1$ (there is no bias on the very last node in the diagram). Note that we explicitly visualise the non-linear activation function as distinct lines going into the post-activation nodes represented by pink squares.
  • Figure 2: Plots of the different posterior distributions, shown are $100$ samples from the respective posteriors (orange). The green dotted lines indicate the true mean function we are attempting to predict, the dashed purple lines are the respective $2.5\%$ and $97.5\%$ posterior quantiles.
  • Figure 3: Box plots of observed coverage for the different architectures across $100$ experiments. Black lines: target coverages of $65\%$, $95\%$, and $99\%$, respectively.
  • Figure 4: Results from the abalone data set. Left: for a single test point, the true number of rings is shown as a red vertical line, the prediction of the trained neural network is shown as a blue vertical line. The posterior predictive distributions are shown for the full BNN (blue line), sep-PaTraC (orange and green dotted lines), mix-PaTrac (red and purple dashed line), and out-PaTraC (brown and pink dotted-dashed line). Right: kernel density estimates for posterior quality comparison, shown is the empirical distribution of $\sum_{i=1}^{500}\mathbbm{1}\{f_{\theta_i}(x)<y\}/500$ over all $(x,y)$ in the test set. Distributions close to uniform correspond to better posterior quality, while convex and concave distributions correspond to over- and under-confident posteriors, respectively.