Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

Theodore Papamarkou; Maria Skoularidou; Konstantina Palla; Laurence Aitchison; Julyan Arbel; David Dunson; Maurizio Filippone; Vincent Fortuin; Philipp Hennig; José Miguel Hernández-Lobato; Aliaksandr Hubin; Alexander Immer; Theofanis Karaletsos; Mohammad Emtiyaz Khan; Agustinus Kristiadi; Yingzhen Li; Stephan Mandt; Christopher Nemeth; Michael A. Osborne; Tim G. J. Rudner; David Rügamer; Yee Whye Teh; Max Welling; Andrew Gordon Wilson; Ruqi Zhang

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

TL;DR

The paper argues that Bayesian Deep Learning is essential in the age of large-scale AI because it provides principled uncertainty quantification, data efficiency, and adaptability to evolving domains, all of which are crucial for safe and reliable deployment of foundation models. It surveys the strengths and current challenges of BDL—posterior inference, priors, scalability, and foundation-model integration—and outlines concrete future directions, including novel posterior samplers, hybrid Bayesian approaches, deep kernel methods, semi/self-supervised Bayesian learning, probabilistic numerics, and compression. By linking these methodological advances to practical needs in uncertainty-aware decision-making, the authors advocate for integrating BDL with large models to unlock robust, trustworthy AI across domains. The discussion emphasizes the potential of BDL to enhance reliability and interpretability, while calling for scalable tooling, benchmarks, and application-driven development, particularly for foundation-model workflows.

Abstract

In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

TL;DR

Abstract

Paper Structure (33 sections, 39 equations, 2 figures)

This paper contains 33 sections, 39 equations, 2 figures.

Introduction
Why Bayesian Deep Learning Matters
Uncertainty Quantification
Data Efficiency
Adaptability to New and Evolving Domains
Model Misspecification and Interpretability
Current Challenges
Laplace and Variational Approximations
Ensembles
Posterior Sampling Algorithms
Prior Specification
Scalability
Foundation Models
Proposed Future Directions
Posterior Sampling Algorithms
...and 18 more sections

Figures (2)

Figure 1: Popular LLM chat assistants, such as Bing Chat (using GPT-4) and LLAMA-2-70B, often produce wrong answer with very high confidence, indicating that their confidence is not calibrated. BDL has traditionally been used to overcome this kind of overconfidence problem and yet BDL is underutilized in the LLM era. Note that OS(=O)(=O)O is a textual representation of the well-known molecule H$_2$SO$_4$ and can easily be looked up on Wikipedia. Emphasis and ellipsis ours. Accessed on 2024-01-23.
Figure 2: Different BDL methods for approximating a posterior $p(\theta \mid \mathcal{D})$ on a parameter space $\Theta$. While Laplace and Gaussian-based variational approaches yield Gaussian approximations, they generally capture different local modes of the posterior. Ensemble methods use maximum a posteriori estimates as their samples.

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

TL;DR

Abstract

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

Authors

TL;DR

Abstract

Table of Contents

Figures (2)