Table of Contents
Fetching ...

Stein transport for Bayesian inference

Nikolas Nüsken

TL;DR

This work introduces Stein transport, a kernel-based, particle-transport method for Bayesian inference that moves an ensemble along a fixed tempering curve between prior and posterior distributions. By solving a kernel ridge regression problem in an RKHS and leveraging a time-varying Stein operator, the method provides a finite-time posterior approximation at $t=1$, with connections to Stein variational gradient descent and to Stein geometry. The paper develops a mean-field theory, analyzes regularisation and finite-particle effects (including a birth-death-like interpretation via Fisher-Rao gradients), and proposes Adjusted Stein transport to bolster stability and accuracy. Numerical experiments demonstrate that Stein transport often achieves more accurate posteriors with substantially lower computational budgets than SVGD and is robust to high-dimensional variance issues, making it a promising alternative for scalable Bayesian inference.

Abstract

We introduce $\textit{Stein transport}$, a novel methodology for Bayesian inference designed to efficiently push an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can be derived either through a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map in the Stein geometry. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a time-varying score function as well as specific weights attached to the particles. While SVGD relies on convergence in the long-time limit, Stein transport reaches its posterior approximation at finite time $t=1$. Studying the mean-field limit, we discuss the errors incurred by regularisation and finite-particle effects, and we connect Stein transport to birth-death dynamics and Fisher-Rao gradient flows. In a series of experiments, we show that in comparison to SVGD, Stein transport not only often reaches more accurate posterior approximations with a significantly reduced computational budget, but that it also effectively mitigates the variance collapse phenomenon commonly observed in SVGD.

Stein transport for Bayesian inference

TL;DR

This work introduces Stein transport, a kernel-based, particle-transport method for Bayesian inference that moves an ensemble along a fixed tempering curve between prior and posterior distributions. By solving a kernel ridge regression problem in an RKHS and leveraging a time-varying Stein operator, the method provides a finite-time posterior approximation at , with connections to Stein variational gradient descent and to Stein geometry. The paper develops a mean-field theory, analyzes regularisation and finite-particle effects (including a birth-death-like interpretation via Fisher-Rao gradients), and proposes Adjusted Stein transport to bolster stability and accuracy. Numerical experiments demonstrate that Stein transport often achieves more accurate posteriors with substantially lower computational budgets than SVGD and is robust to high-dimensional variance issues, making it a promising alternative for scalable Bayesian inference.

Abstract

We introduce , a novel methodology for Bayesian inference designed to efficiently push an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can be derived either through a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map in the Stein geometry. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a time-varying score function as well as specific weights attached to the particles. While SVGD relies on convergence in the long-time limit, Stein transport reaches its posterior approximation at finite time . Studying the mean-field limit, we discuss the errors incurred by regularisation and finite-particle effects, and we connect Stein transport to birth-death dynamics and Fisher-Rao gradient flows. In a series of experiments, we show that in comparison to SVGD, Stein transport not only often reaches more accurate posterior approximations with a significantly reduced computational budget, but that it also effectively mitigates the variance collapse phenomenon commonly observed in SVGD.
Paper Structure (33 sections, 15 theorems, 137 equations, 5 figures, 2 algorithms)

This paper contains 33 sections, 15 theorems, 137 equations, 5 figures, 2 algorithms.

Key Result

Proposition 1

Assume that the time-dependent vector field $v_t \in C^1(\mathbb{R}^d;\mathbb{R}^d)$ satisfies the Stein equation for all $t \in [0,1]$. Then the ordinary differential equation (ODE) with random initial condition $\pi_0$, reproduces the interpolation eq:homotopy, in the sense that $\mathrm{Law}(X_t) = \pi_t$, for all $t \in [0,1]$, whenever eq:ODE is well posed.

Figures (5)

  • Figure 1: Posterior approximations for the Joker distribution.
  • Figure 2: KSD evolution for the Joker distribution from Section \ref{['sec:joker']}.
  • Figure 3: Averaged variance $\tfrac{1}{d} \mathrm{Tr} \,\widehat{\mathrm{Cov}}$ of the approximate posterior, as a function of the dimension. The black line indicates the true value $\tfrac{1}{2}$.
  • Figure 4: Gaussian mixture with low-rank structure ($d = 50$, ensemble size $200$ particles). We show the marginals in the first two coordinates of the approximations obtained by SVGD and adjusted Stein transport.
  • Figure 5: Bayesian logistic regression for the Splice data set ($d = 60$): KSD and test accuracy along the time evolution of the particle system.

Theorems & Definitions (52)

  • Remark 1
  • Proposition 1: Stein equation
  • proof
  • Remark 2: Nonuniqueness
  • Proposition 2
  • Remark 3
  • proof
  • Lemma 1: Evolution of the score function
  • proof
  • Proposition 3
  • ...and 42 more