Table of Contents
Fetching ...

Probabilistic Inference and Learning with Stein's Method

Qiang Liu, Lester Mackey, Chris Oates

TL;DR

This monograph provides a rigorous overview of theoretical and methodological aspects of probabilistic inference and learning with Stein's method and describes the connection between Stein operators and Stein variational gradient descent.

Abstract

This monograph provides a rigorous overview of theoretical and methodological aspects of probabilistic inference and learning with Stein's method. Recipes are provided for constructing Stein discrepancies from Stein operators and Stein sets, and properties of these discrepancies such as computability, separation, convergence detection, and convergence control are discussed. Further, the connection between Stein operators and Stein variational gradient descent is set out in detail. The main definitions and results are precisely stated, and references to all proofs are provided.

Probabilistic Inference and Learning with Stein's Method

TL;DR

This monograph provides a rigorous overview of theoretical and methodological aspects of probabilistic inference and learning with Stein's method and describes the connection between Stein operators and Stein variational gradient descent.

Abstract

This monograph provides a rigorous overview of theoretical and methodological aspects of probabilistic inference and learning with Stein's method. Recipes are provided for constructing Stein discrepancies from Stein operators and Stein sets, and properties of these discrepancies such as computability, separation, convergence detection, and convergence control are discussed. Further, the connection between Stein operators and Stein variational gradient descent is set out in detail. The main definitions and results are precisely stated, and references to all proofs are provided.
Paper Structure (84 sections, 54 theorems, 294 equations, 10 figures)

This paper contains 84 sections, 54 theorems, 294 equations, 10 figures.

Key Result

Theorem 2.1

Let $\mu$ and $\nu$ be probability measures on a measurable space $(\Omega,\mathcal{S})$ with $\mu \ll \nu$. Then there exists a measurable function $p : \Omega \rightarrow [0,\infty)$ such that $\mu(S) = \int p(\omega) 1_S(\omega) \; \mathrm{d}\nu(\omega)$ for any $S \in \mathcal{S}$.

Figures (10)

  • Figure 1: Samples generated using the unadjusted Langevin algorithm (ULA). Here the target $P$ is a Gaussian mixture model with density $p$ and $\epsilon$ denotes the step size parameter of ULA. Contours of $\log p$ are depicted.
  • Figure 2: Tuning the unadjusted Langevin algorithm (ULA). The left panel presents the mean square error associated with the empirical estimates for the means and variances of $P$ (which cannot be computed in general), while the right panel presents the ksd (which can be computed). Here we used the inverse multi-quadric kernel (\ref{['ex: IMQ kernel']}) with exponent $\beta = 1$ and bandwidth $\ell = 0.01$. The experiment was repeated 10 times and means and standard errors are reported.
  • Figure 3: Goodness-of-fit testing with ksd. Here we plot the null distribution of the ksd test statistic as approximated using the wild bootstrap (blue), the actual value of the ksd test statistic (black), and the values for which the null would be rejected (shaded). The amount of model misspecification is denoted with $\sigma$, so that when $\sigma = 0$ the model is correct. The inverse multi-quadric kernel with length scale $\ell$ was used.
  • Figure 4: Goodness-of-fit testing with ksd. Here we plot the test power (i.e. the probability of rejecting the null) as a function of both the amount of model misspecification $\sigma$ and the kernel length scale $\ell$. Means and standard errors over 100 experiments are reported.
  • Figure 5: Stein points (left) and Frank--Wolfe--Stein points (right) were used to approximate the Rosenbrock target \ref{['eq: Rosenbrock']} (black shaded).
  • ...and 5 more figures

Theorems & Definitions (164)

  • Definition 2.1: $\sigma$-algebra
  • Definition 2.2: Measure
  • Example 2.1: Atomic measure
  • Example 2.2: Discrete distributions
  • Example 2.3: Borel measures
  • Example 2.4: Lebesgue measures
  • Definition 2.3: Measurable function
  • Definition 2.4: Lebesgue integral
  • Theorem 2.1: Radon--Nikodym
  • Definition 2.5: Continuous distributions
  • ...and 154 more