Finite-Particle Rates for Regularized Stein Variational Gradient Descent
Ye He, Krishnakumar Balasubramanian, Sayan Banerjee, Promit Ghosal
TL;DR
This work develops non-asymptotic, finite-particle rates for Regularized-Stein Variational Gradient Descent (R-SVGD), a resolvent-preconditioned variant of SVGD designed to debias kernel-induced bias and better approximate the Wasserstein gradient flow. By deriving both continuous- and discrete-time analyses, the authors show time-averaged Fisher information decay and, under a transport-information inequality, convergence in Wasserstein-1 for annealed measures, with explicit bounds that separate initialization, finite-particle error, and regularization effects. They reveal a principled tuning regime for the regularization parameter $\nu$, step size $h$, and averaging horizon, showing a near-optimal $N^{-1}$ rate when $\nu$ remains close to 1 and a controlled trade-off when $\nu$ tends to 0 (approaching the Wasserstein flow) at the cost of slower rates. Overall, the results provide kernel-agnostic (in the exponent) convergence guarantees for practical R-SVGD implementations, linking Fisher information decay to $W_1$ convergence and offering guidance for tuning in high-dimensional Bayesian inference and related tasks.
Abstract
We derive finite-particle rates for the regularized Stein variational gradient descent (R-SVGD) algorithm introduced by He et al. (2024) that corrects the constant-order bias of the SVGD by applying a resolvent-type preconditioner to the kernelized Wasserstein gradient. For the resulting interacting $N$-particle system, we establish explicit non-asymptotic bounds for time-averaged (annealed) empirical measures, illustrating convergence in the \emph{true} (non-kernelized) Fisher information and, under a $\mathrm{W}_1\mathrm{I}$ condition on the target, corresponding $\mathrm{W}_1$ convergence for a large class of smooth kernels. Our analysis covers both continuous- and discrete-time dynamics and yields principled tuning rules for the regularization parameter, step size, and averaging horizon that quantify the trade-off between approximating the Wasserstein gradient flow and controlling finite-particle estimation error.
