Near-Optimal Approximations for Bayesian Inference in Function Space
Veit Wild, James Wu, Dino Sejdinovic, Jeremias Knoblauch
TL;DR
This work introduces a scalable, nonparametric approach to Bayesian inference in function spaces by casting the posterior as the stationary distribution of an RKHS-valued Langevin diffusion. By projecting the infinite-dimensional diffusion onto the first $M$ Kosambi–Karhunen–Loève components and using a sufficiency-based pushforward, the method (Projected Langevin Sampling, PLS) achieves a computational cost of $O(M^3+JM^2)$ while retaining high-fidelity posterior approximations. The authors prove near-optimality of the projected method relative to the best nonparametric variational posterior, with stronger results in the Gaussian-likelihood case where PLS coincides with SVGP. The framework accommodates arbitrary likelihoods and yields non-Gaussian posteriors, enabling multimodal inference that SVGPs cannot capture. Numerically, PLS matches or surpasses SVGP on regression and classification benchmarks, while offering clear advantages in non-Gaussian and multimodal settings and enabling parallelized computation for large-scale problems.
Abstract
We propose a scalable inference algorithm for Bayes posteriors defined on a reproducing kernel Hilbert space (RKHS). Given a likelihood function and a Gaussian random element representing the prior, the corresponding Bayes posterior measure $Π_{\text{B}}$ can be obtained as the stationary distribution of an RKHS-valued Langevin diffusion. We approximate the infinite-dimensional Langevin diffusion via a projection onto the first $M$ components of the Kosambi-Karhunen-Loève expansion. Exploiting the thus obtained approximate posterior for these $M$ components, we perform inference for $Π_{\text{B}}$ by relying on the law of total probability and a sufficiency assumption. The resulting method scales as $O(M^3+JM^2)$, where $J$ is the number of samples produced from the posterior measure $Π_{\text{B}}$. Interestingly, the algorithm recovers the posterior arising from the sparse variational Gaussian process (SVGP) (see Titsias, 2009) as a special case, owed to the fact that the sufficiency assumption underlies both methods. However, whereas the SVGP is parametrically constrained to be a Gaussian process, our method is based on a non-parametric variational family $\mathcal{P}(\mathbb{R}^M)$ consisting of all probability measures on $\mathbb{R}^M$. As a result, our method is provably close to the optimal $M$-dimensional variational approximation of the Bayes posterior $Π_{\text{B}}$ in $\mathcal{P}(\mathbb{R}^M)$ for convex and Lipschitz continuous negative log likelihoods, and coincides with SVGP for the special case of a Gaussian error likelihood.
