R-ParVI: Particle-based variational inference through lens of rewards
Yongchao Huang
TL;DR
This paper introduces R-ParVI, a gradient-free, reward-guided particle-based variational inference method designed to sample from densities known only up to a constant. By framing sampling as a reward-driven flow with a two-term objective $R(\mathbf{x}) = \alpha \\tilde{p}(\mathbf{x}) + \beta \\big(-\\tilde{p}(\\mathbf{x}) \\log\\tilde{p}(\\mathbf{x})\\big)$, it simultaneously promotes high-density regions and maintains diversity through entropy, while updating each particle via stochastic moves and velocity adjustments. The approach emphasizes scalability and parallelizability, achieving $\mathcal{O}(Md)$ per-iteration complexity and avoiding costly inter-particle kernels, with independence between particles enabling straightforward parallelization. Although conceptually framed and gradient-free, the method currently omits explicit particle-particle interactions; future work proposes incorporating neighborhood information and a fuller RL formulation to further boost performance on complex probabilistic models.
Abstract
A reward-guided, gradient-free ParVI method, \textit{R-ParVI}, is proposed for sampling partially known densities (e.g. up to a constant). R-ParVI formulates the sampling problem as particle flow driven by rewards: particles are drawn from a prior distribution, navigate through parameter space with movements determined by a reward mechanism blending assessments from the target density, with the steady state particle configuration approximating the target geometry. Particle-environment interactions are simulated by stochastic perturbations and the reward mechanism, which drive particles towards high density regions while maintaining diversity (e.g. preventing from collapsing into clusters). R-ParVI offers fast, flexible, scalable and stochastic sampling and inference for a class of probabilistic models such as those encountered in Bayesian inference and generative modelling.
