Partial Trace-Class Bayesian Neural Networks
Arran Carter, Torben Sell
TL;DR
The paper tackles the computational burden of uncertainty quantification in Bayesian neural networks by introducing Partial Trace-Class Bayesian Neural Networks (PaTraC BNNs), which reduce the number of Bayesian parameters while preserving informative uncertainty estimates. It leverages a trace-class prior that imposes an intrinsic ordering of network parameters and employs Hilbert-space MCMC (pCNL) for posterior inference, enabling efficient training even in high dimensions. Three architectures—Sep-PaTraC, Out-PaTraC, and Mix-PaTraC—are proposed to mix Bayesian and non-Bayesian components, with Mix-PaTraC offering the closest approximation to a full BNN at lower cost. Across synthetic and real datasets (CIFAR-10 and Abalone), PaTraC BNNs achieve competitive uncertainty quantification with substantial speedups and reduced memory usage, providing a scalable path for reliable uncertainty in deep learning applications. The work also outlines theoretical extensions and practical considerations, including environmental benefits and potential infinite-width analyses.
Abstract
Bayesian neural networks (BNNs) allow rigorous uncertainty quantification in deep learning, but often come at a prohibitive computational cost. We propose three different innovative architectures of partial trace-class Bayesian neural networks (PaTraC BNNs) that enable uncertainty quantification comparable to standard BNNs but use significantly fewer Bayesian parameters. These PaTraC BNNs have computational and statistical advantages over standard Bayesian neural networks in terms of speed and memory requirements. Our proposed methodology therefore facilitates reliable, robust, and scalable uncertainty quantification in neural networks. The three architectures build on trace-class neural network priors which induce an ordering of the neural network parameters, and are thus a natural choice in our framework. In a numerical simulation study, we verify the claimed benefits, and further illustrate the performance of our proposed methodology on a real-world dataset.
