Table of Contents
Fetching ...

Bayesian Computation in Deep Learning

Wenlong Chen, Bolian Li, Ruqi Zhang, Yingzhen Li

TL;DR

This chapter surveys how Bayesian computation can be integrated with deep learning to quantify predictive uncertainty and train generative models. It covers two main approximation strategies—stochastic gradient MCMC (SG-MCMC) and variational inference (VI)—and their extensions, including SGLD, SGHMC, cyclical step sizes, alpha-divergences, and structured posterior families. It then extends to deep generative models, detailing energy-based models, diffusion/score-based methods, and deep latent variable models with VAEs and hybrids of VI and MCMC. Key challenges include high-dimensional weight spaces, multi-modal posteriors, and the scalability of inference in large datasets, with practical guidance on balancing accuracy and computational cost. Overall, the chapter emphasizes predictive uncertainty, scalable posterior inference, and the integration of VI and MCMC as central themes for Bayesian computation in modern deep learning.

Abstract

Bayesian methods have shown success in deep learning applications. For example, in predictive tasks, Bayesian neural networks leverage Bayesian reasoning of model uncertainty to improve the reliability and uncertainty awareness of deep neural networks. In generative modeling domain, many widely used deep generative models, such as deep latent variable models, require approximate Bayesian inference to infer their latent variables for the training. In this chapter, we provide an introduction to approximate inference techniques as Bayesian computation methods applied to deep learning models, with a focus on Bayesian neural networks and deep generative models. We review two arguably most popular approximate Bayesian computational methods, stochastic gradient Markov chain Monte Carlo (SG-MCMC) and variational inference (VI), and explain their unique challenges in posterior inference as well as the solutions when applied to deep learning models.

Bayesian Computation in Deep Learning

TL;DR

This chapter surveys how Bayesian computation can be integrated with deep learning to quantify predictive uncertainty and train generative models. It covers two main approximation strategies—stochastic gradient MCMC (SG-MCMC) and variational inference (VI)—and their extensions, including SGLD, SGHMC, cyclical step sizes, alpha-divergences, and structured posterior families. It then extends to deep generative models, detailing energy-based models, diffusion/score-based methods, and deep latent variable models with VAEs and hybrids of VI and MCMC. Key challenges include high-dimensional weight spaces, multi-modal posteriors, and the scalability of inference in large datasets, with practical guidance on balancing accuracy and computational cost. Overall, the chapter emphasizes predictive uncertainty, scalable posterior inference, and the integration of VI and MCMC as central themes for Bayesian computation in modern deep learning.

Abstract

Bayesian methods have shown success in deep learning applications. For example, in predictive tasks, Bayesian neural networks leverage Bayesian reasoning of model uncertainty to improve the reliability and uncertainty awareness of deep neural networks. In generative modeling domain, many widely used deep generative models, such as deep latent variable models, require approximate Bayesian inference to infer their latent variables for the training. In this chapter, we provide an introduction to approximate inference techniques as Bayesian computation methods applied to deep learning models, with a focus on Bayesian neural networks and deep generative models. We review two arguably most popular approximate Bayesian computational methods, stochastic gradient Markov chain Monte Carlo (SG-MCMC) and variational inference (VI), and explain their unique challenges in posterior inference as well as the solutions when applied to deep learning models.

Paper Structure

This paper contains 19 sections, 47 equations, 7 figures, 4 algorithms.

Figures (7)

  • Figure 1: Difference between standard deep neural network and Bayesian neural network.
  • Figure 2: Comparison between the cyclical stepsize schedule (red) and the traditional decreasing stepsize schedule (blue) for SG-MCMC algorithms. Adapted from zhangcyclical. Used with the kind permission of Ruqi Zhang.
  • Figure 3: Factorized Gaussians fitted by minimizing $\alpha$-divergences with different $\alpha$'s for a correlated 2D Gasussian target:
  • Figure 4: The training procedure of energy-based models. Generated data is obtained by sampling from the neural network-parameterized distribution $\bm{X}_{\text{fake}}\sim\exp(-E_{\bm{\theta}}(\bm{x}))$. The model parameter $\bm{\theta}$ is updated by minimizing the contrastive divergence between the real and generated data.
  • Figure 5: In diffusion models, the forward process progressively introduces noise, whereas the reverse process aims to denoise the perturbed data. The denoising step typically involves the estimation of the score function. Adapted from yang2022diffusion. Used with kind permission of Ling Yang.
  • ...and 2 more figures