Bayesian Computation in Deep Learning
Wenlong Chen, Bolian Li, Ruqi Zhang, Yingzhen Li
TL;DR
This chapter surveys how Bayesian computation can be integrated with deep learning to quantify predictive uncertainty and train generative models. It covers two main approximation strategies—stochastic gradient MCMC (SG-MCMC) and variational inference (VI)—and their extensions, including SGLD, SGHMC, cyclical step sizes, alpha-divergences, and structured posterior families. It then extends to deep generative models, detailing energy-based models, diffusion/score-based methods, and deep latent variable models with VAEs and hybrids of VI and MCMC. Key challenges include high-dimensional weight spaces, multi-modal posteriors, and the scalability of inference in large datasets, with practical guidance on balancing accuracy and computational cost. Overall, the chapter emphasizes predictive uncertainty, scalable posterior inference, and the integration of VI and MCMC as central themes for Bayesian computation in modern deep learning.
Abstract
Bayesian methods have shown success in deep learning applications. For example, in predictive tasks, Bayesian neural networks leverage Bayesian reasoning of model uncertainty to improve the reliability and uncertainty awareness of deep neural networks. In generative modeling domain, many widely used deep generative models, such as deep latent variable models, require approximate Bayesian inference to infer their latent variables for the training. In this chapter, we provide an introduction to approximate inference techniques as Bayesian computation methods applied to deep learning models, with a focus on Bayesian neural networks and deep generative models. We review two arguably most popular approximate Bayesian computational methods, stochastic gradient Markov chain Monte Carlo (SG-MCMC) and variational inference (VI), and explain their unique challenges in posterior inference as well as the solutions when applied to deep learning models.
