Particle Semi-Implicit Variational Inference
Jen Ning Lim, Adam M. Johansen
TL;DR
The paper addresses the intractability of directly optimizing the ELBO in semi-implicit variational inference (SIVI) when the mixing distribution is parameterized implicitly, by introducing Particle Variational Inference (PVI). PVI formulates a gradient flow on the Euclidean–Wasserstein geometry for the pair $(\theta,r)\in \Theta\times\mathcal{P}(\mathbb{R}^{d_z})$, with a regularized free energy $\mathcal{E}_\lambda(\theta,r)$ whose minimizers correspond to optimal variational posteriors. A practical, particle-based discretization yields an actionable algorithm that directly optimizes the ELBO without restrictive parametric forms for $r$, and a theoretical analysis establishes existence/uniqueness and propagation of chaos for the particle system. Empirically, PVI outperforms prior SIVI methods across density estimation, Bayesian logistic regression, and Bayesian neural networks, while offering greater expressivity via learned mixing distributions. The work thus provides a principled, scalable route to richer variational families in Bayesian inference and offers insights into the associated gradient-flow dynamics.
Abstract
Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is not possible, so they resort to one of the following: optimizing bounds on the ELBO, employing costly inner-loop Markov chain Monte Carlo runs, or solving minimax objectives. In this paper, we propose a novel method for SIVI called Particle Variational Inference (PVI) which employs empirical measures to approximate the optimal mixing distributions characterized as the minimizer of a free energy functional. PVI arises naturally as a particle approximation of a Euclidean--Wasserstein gradient flow and, unlike prior works, it directly optimizes the ELBO whilst making no parametric assumption about the mixing distribution. Our empirical results demonstrate that PVI performs favourably compared to other SIVI methods across various tasks. Moreover, we provide a theoretical analysis of the behaviour of the gradient flow of a related free energy functional: establishing the existence and uniqueness of solutions as well as propagation of chaos results.
