Table of Contents
Fetching ...

On Feynman--Kac training of partial Bayesian neural networks

Zheng Zhao, Sebastian Mair, Thomas B. Schön, Jens Sjölund

TL;DR

This work tackles training partial Bayesian neural networks (pBNNs) where only a subset of weights are stochastic and latent-variable inference is needed. It casts training as simulating a Feynman–Kac model and develops scalable sequential Monte Carlo (SMC) samplers to jointly estimate the deterministic parameters $\psi$ and the latent posterior $p(\phi|y_{1:N};\psi)$, addressing multi-modality in the latent space. Two algorithms, SGSMC and OHSMC, are proposed: SGSMC uses stochastic gradients with mini-batches, while OHSMC warm-starts from prior posteriors and updates parameters and posteriors concurrently, with extensions like Poisson estimators to reduce bias. Across synthetic, UCI, and MNIST experiments, these methods achieve state-of-the-art or competitive predictive performance relative to MAP-HMC, SWAG, and VB, demonstrating practical scalability for uncertainty quantification in pBNNs.

Abstract

Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wherein the training of a pBNN is formulated as simulating a Feynman--Kac model. We then describe variations of sequential Monte Carlo samplers that allow us to simultaneously estimate the parameters and the latent posterior distribution of this model at a tractable computational cost. Using various synthetic and real-world datasets we show that our proposed training scheme outperforms the state of the art in terms of predictive performance.

On Feynman--Kac training of partial Bayesian neural networks

TL;DR

This work tackles training partial Bayesian neural networks (pBNNs) where only a subset of weights are stochastic and latent-variable inference is needed. It casts training as simulating a Feynman–Kac model and develops scalable sequential Monte Carlo (SMC) samplers to jointly estimate the deterministic parameters and the latent posterior , addressing multi-modality in the latent space. Two algorithms, SGSMC and OHSMC, are proposed: SGSMC uses stochastic gradients with mini-batches, while OHSMC warm-starts from prior posteriors and updates parameters and posteriors concurrently, with extensions like Poisson estimators to reduce bias. Across synthetic, UCI, and MNIST experiments, these methods achieve state-of-the-art or competitive predictive performance relative to MAP-HMC, SWAG, and VB, demonstrating practical scalability for uncertainty quantification in pBNNs.

Abstract

Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wherein the training of a pBNN is formulated as simulating a Feynman--Kac model. We then describe variations of sequential Monte Carlo samplers that allow us to simultaneously estimate the parameters and the latent posterior distribution of this model at a tractable computational cost. Using various synthetic and real-world datasets we show that our proposed training scheme outperforms the state of the art in terms of predictive performance.
Paper Structure (56 sections, 11 equations, 7 figures, 8 tables)

This paper contains 56 sections, 11 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Traces of parameter estimations. The true value is 1.
  • Figure 2: Posterior approximation for the model in Equation \ref{['equ:crescent']}.
  • Figure 3: Visualisation of the synthetic regression. The scatter points represent the test data and the dashed line depicts the true function. The grey lines are predictive samples from their learnt pBNNs.
  • Figure 4: Visualisation of the two-moons classifications. The scatter points are the test data with hollow/solid representing the label. The grey lines represent the classification hyperplanes sampled from the trained pBNNs.
  • Figure 5: Visualising the flow of the posterior and parameter estimates by OHSMC. In the first epoch, the samples are drawn from the prior. We see that as the epoch increases, the estimates for both $\psi$ and $\phi$ converge.
  • ...and 2 more figures