Stein transport for Bayesian inference
Nikolas Nüsken
TL;DR
This work introduces Stein transport, a kernel-based, particle-transport method for Bayesian inference that moves an ensemble along a fixed tempering curve between prior and posterior distributions. By solving a kernel ridge regression problem in an RKHS and leveraging a time-varying Stein operator, the method provides a finite-time posterior approximation at $t=1$, with connections to Stein variational gradient descent and to Stein geometry. The paper develops a mean-field theory, analyzes regularisation and finite-particle effects (including a birth-death-like interpretation via Fisher-Rao gradients), and proposes Adjusted Stein transport to bolster stability and accuracy. Numerical experiments demonstrate that Stein transport often achieves more accurate posteriors with substantially lower computational budgets than SVGD and is robust to high-dimensional variance issues, making it a promising alternative for scalable Bayesian inference.
Abstract
We introduce $\textit{Stein transport}$, a novel methodology for Bayesian inference designed to efficiently push an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can be derived either through a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map in the Stein geometry. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a time-varying score function as well as specific weights attached to the particles. While SVGD relies on convergence in the long-time limit, Stein transport reaches its posterior approximation at finite time $t=1$. Studying the mean-field limit, we discuss the errors incurred by regularisation and finite-particle effects, and we connect Stein transport to birth-death dynamics and Fisher-Rao gradient flows. In a series of experiments, we show that in comparison to SVGD, Stein transport not only often reaches more accurate posterior approximations with a significantly reduced computational budget, but that it also effectively mitigates the variance collapse phenomenon commonly observed in SVGD.
