Table of Contents
Fetching ...

Probabilistic Artificial Intelligence

Andreas Krause, Jonas Hübotter

TL;DR

This work articulates a cohesive probabilistic framework for learning and decision making under uncertainty, unifying Bayesian inference, Gaussian processes, Kalman filtering, and variational methods. It highlights the central role of Bayes' rule in updating beliefs, and distinguishes epistemic from aleatoric uncertainty to guide data acquisition and safe exploration. By developing both weight-space and function-space (kernel) perspectives, the text shows how exact and approximate inference can be achieved in linear, nonlinear, and temporal settings. The resulting paradigm supports principled learning and prediction in complex AI tasks, from regression and filtering to sequential decision making, while offering scalable approximations for real-world data. Overall, the manuscript provides a comprehensive toolkit for probabilistic reasoning across static and sequential domains, connecting classical theory with modern ML practice.

Abstract

Artificial intelligence commonly refers to the science and engineering of artificial systems that can carry out tasks generally associated with requiring aspects of human intelligence, such as playing games, translating languages, and driving cars. In recent years, there have been exciting advances in learning-based, data-driven approaches towards AI, and machine learning and deep learning have enabled computer systems to perceive the world in unprecedented ways. Reinforcement learning has enabled breakthroughs in complex games such as Go and challenging robotics tasks such as quadrupedal locomotion. A key aspect of intelligence is to not only make predictions, but reason about the uncertainty in these predictions, and to consider this uncertainty when making decisions. This is what this manuscript on "Probabilistic Artificial Intelligence" is about. The first part covers probabilistic approaches to machine learning. We discuss the differentiation between "epistemic" uncertainty due to lack of data and "aleatoric" uncertainty, which is irreducible and stems, e.g., from noisy observations and outcomes. We discuss concrete approaches towards probabilistic inference and modern approaches to efficient approximate inference. The second part of the manuscript is about taking uncertainty into account in sequential decision tasks. We consider active learning and Bayesian optimization -- approaches that collect data by proposing experiments that are informative for reducing the epistemic uncertainty. We then consider reinforcement learning and modern deep RL approaches that use neural network function approximation. We close by discussing modern approaches in model-based RL, which harness epistemic and aleatoric uncertainty to guide exploration, while also reasoning about safety.

Probabilistic Artificial Intelligence

TL;DR

This work articulates a cohesive probabilistic framework for learning and decision making under uncertainty, unifying Bayesian inference, Gaussian processes, Kalman filtering, and variational methods. It highlights the central role of Bayes' rule in updating beliefs, and distinguishes epistemic from aleatoric uncertainty to guide data acquisition and safe exploration. By developing both weight-space and function-space (kernel) perspectives, the text shows how exact and approximate inference can be achieved in linear, nonlinear, and temporal settings. The resulting paradigm supports principled learning and prediction in complex AI tasks, from regression and filtering to sequential decision making, while offering scalable approximations for real-world data. Overall, the manuscript provides a comprehensive toolkit for probabilistic reasoning across static and sequential domains, connecting classical theory with modern ML practice.

Abstract

Artificial intelligence commonly refers to the science and engineering of artificial systems that can carry out tasks generally associated with requiring aspects of human intelligence, such as playing games, translating languages, and driving cars. In recent years, there have been exciting advances in learning-based, data-driven approaches towards AI, and machine learning and deep learning have enabled computer systems to perceive the world in unprecedented ways. Reinforcement learning has enabled breakthroughs in complex games such as Go and challenging robotics tasks such as quadrupedal locomotion. A key aspect of intelligence is to not only make predictions, but reason about the uncertainty in these predictions, and to consider this uncertainty when making decisions. This is what this manuscript on "Probabilistic Artificial Intelligence" is about. The first part covers probabilistic approaches to machine learning. We discuss the differentiation between "epistemic" uncertainty due to lack of data and "aleatoric" uncertainty, which is irreducible and stems, e.g., from noisy observations and outcomes. We discuss concrete approaches towards probabilistic inference and modern approaches to efficient approximate inference. The second part of the manuscript is about taking uncertainty into account in sequential decision tasks. We consider active learning and Bayesian optimization -- approaches that collect data by proposing experiments that are informative for reducing the epistemic uncertainty. We then consider reinforcement learning and modern deep RL approaches that use neural network function approximation. We close by discussing modern approaches in model-based RL, which harness epistemic and aleatoric uncertainty to guide exploration, while also reasoning about safety.

Paper Structure

This paper contains 227 sections, 45 theorems, 810 equations, 44 figures, 1 table, 25 algorithms.

Key Result

Theorem 1.9

Given random vectors $\mathbf{X}$ and $\mathbf{Y}$, we have

Figures (44)

  • Figure 1: Shown are the PDFs of two-dimensional Gaussians with mean $\mathbold{0}$ and covariance matrices $\mathbold{\Sigma}_1 \overset{.}{=} 1001, \quad \mathbold{\Sigma}_2 \overset{.}{=} 10.90.91$respectively.
  • Figure 2: Comparison of linear regression (MLE), ridge regression (MAP estimate), and Bayesian linear regression when the data is generated according to y \mid \mathbold{w}, \mathbold{x} \sim \mathcal{N}({} \mathbold{w}^\top\mathbold{x}, \sigma_{\mathrm{n}}^2).The true mean is shown in black, the MLE in blue, and the MAP estimate in red. The dark gray area denotes the epistemic uncertainty of Bayesian linear regression and the light gray area the additional homoscedastic noise. On the left, $\sigma_{\mathrm{n}} = 0.15$. On the right, $\sigma_{\mathrm{n}} = 0.7$.
  • Figure 3: Schematic view of Bayesian filtering: An agent perceives the current state of the world and updates its belief accordingly.
  • Figure 6: The top plot shows contour lines of an empirical Bayes with two local optima. The bottom two plots show the Gaussian processes corresponding to the two optimal models. The left model with smaller lengthscale is chosen within a more flexible class of models, while the right model explains more observations through noise. Adapted from figure 5.5 of "gpmltitle" gpml.
  • Figure 7: A commutative diagram of sampling and optimization algorithms. Langevin dynamics (LD) is the non-stochastic variant of SGLD.
  • ...and 39 more figures

Theorems & Definitions (120)

  • Definition 1.1: $\sigma$-algebra
  • Definition 1.2: Probability measure
  • Definition 1.3: Probability space
  • Definition 1.4: Random variable
  • Definition 1.6: Conditional probability
  • Theorem 1.9: Tower rule
  • proof : Proof sketch
  • Theorem 1.10: Law of total variance, LOTV
  • proof : Proof sketch of LOTV
  • Theorem 1.12: Bayes' rule
  • ...and 110 more