Table of Contents
Fetching ...

Functional Stochastic Gradient MCMC for Bayesian Neural Networks

Mengjing Wu, Junyu Xuan, Jie Lu

TL;DR

Novel functional MCMC schemes are introduced, including stochastic gradient versions, based on newly designed diffusion dynamics that can incorporate more informative functional priors and it is proved that the stationary measure of these functional dynamics is the target posterior over functions.

Abstract

Classical parameter-space Bayesian inference for Bayesian neural networks (BNNs) suffers from several unresolved prior issues, such as knowledge encoding intractability and pathological behaviours in deep networks, which can lead to improper posterior inference. To address these issues, functional Bayesian inference has recently been proposed leveraging functional priors, such as the emerging functional variational inference. In addition to variational methods, stochastic gradient Markov Chain Monte Carlo (MCMC) is another scalable and effective inference method for BNNs to asymptotically generate samples from the true posterior by simulating continuous dynamics. However, existing MCMC methods perform solely in parameter space and inherit the unresolved prior issues, while extending these dynamics to function space is a non-trivial undertaking. In this paper, we introduce novel functional MCMC schemes, including stochastic gradient versions, based on newly designed diffusion dynamics that can incorporate more informative functional priors. Moreover, we prove that the stationary measure of these functional dynamics is the target posterior over functions. Our functional MCMC schemes demonstrate improved performance in both predictive accuracy and uncertainty quantification on several tasks compared to naive parameter-space MCMC and functional variational inference.

Functional Stochastic Gradient MCMC for Bayesian Neural Networks

TL;DR

Novel functional MCMC schemes are introduced, including stochastic gradient versions, based on newly designed diffusion dynamics that can incorporate more informative functional priors and it is proved that the stationary measure of these functional dynamics is the target posterior over functions.

Abstract

Classical parameter-space Bayesian inference for Bayesian neural networks (BNNs) suffers from several unresolved prior issues, such as knowledge encoding intractability and pathological behaviours in deep networks, which can lead to improper posterior inference. To address these issues, functional Bayesian inference has recently been proposed leveraging functional priors, such as the emerging functional variational inference. In addition to variational methods, stochastic gradient Markov Chain Monte Carlo (MCMC) is another scalable and effective inference method for BNNs to asymptotically generate samples from the true posterior by simulating continuous dynamics. However, existing MCMC methods perform solely in parameter space and inherit the unresolved prior issues, while extending these dynamics to function space is a non-trivial undertaking. In this paper, we introduce novel functional MCMC schemes, including stochastic gradient versions, based on newly designed diffusion dynamics that can incorporate more informative functional priors. Moreover, we prove that the stationary measure of these functional dynamics is the target posterior over functions. Our functional MCMC schemes demonstrate improved performance in both predictive accuracy and uncertainty quantification on several tasks compared to naive parameter-space MCMC and functional variational inference.
Paper Structure (34 sections, 2 theorems, 20 equations, 8 figures, 12 tables, 2 algorithms)

This paper contains 34 sections, 2 theorems, 20 equations, 8 figures, 12 tables, 2 algorithms.

Key Result

Proposition 3.1

The stationary probability measure of the functional Langevin dynamics defined in eq:flg is the target posterior over functions $P_{f|\mathcal{D}}$.

Figures (8)

  • Figure 1: 1-D extrapolation example. The green line is the ground true function, and the blue lines correspond to the mean of samples from posterior predictions. Black dots denote 20 training points; shadow areas represent the predictive standard deviations. For more details, see \ref{['apd:expset']}.
  • Figure 2: Comparisons of cumulative regrets of fSGLD, SGLD, FBNN, IFBNN, BBB for contextual bandit task on the Mushroom dataset. Lower represents better performance.
  • Figure 3: Log-posterior probability versus the number of iterations.
  • Figure 4: The effect of sample size on 1-D extrapolation example for fSGLD, SGLD, fSGHMC, SGHMC. The number after the short line in each subheading represents the sample size.
  • Figure 5: The effects of the number of measurement points on the gradient estimation of functional prior for fSGLD and fSGHMC. The number after the short line in each subheading represents the number of measurement points.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Proposition 3.1
  • proof
  • Proposition 3.2
  • proof