Finite-Particle Rates for Regularized Stein Variational Gradient Descent

Ye He; Krishnakumar Balasubramanian; Sayan Banerjee; Promit Ghosal

Finite-Particle Rates for Regularized Stein Variational Gradient Descent

Ye He, Krishnakumar Balasubramanian, Sayan Banerjee, Promit Ghosal

TL;DR

This work develops non-asymptotic, finite-particle rates for Regularized-Stein Variational Gradient Descent (R-SVGD), a resolvent-preconditioned variant of SVGD designed to debias kernel-induced bias and better approximate the Wasserstein gradient flow. By deriving both continuous- and discrete-time analyses, the authors show time-averaged Fisher information decay and, under a transport-information inequality, convergence in Wasserstein-1 for annealed measures, with explicit bounds that separate initialization, finite-particle error, and regularization effects. They reveal a principled tuning regime for the regularization parameter $\nu$, step size $h$, and averaging horizon, showing a near-optimal $N^{-1}$ rate when $\nu$ remains close to 1 and a controlled trade-off when $\nu$ tends to 0 (approaching the Wasserstein flow) at the cost of slower rates. Overall, the results provide kernel-agnostic (in the exponent) convergence guarantees for practical R-SVGD implementations, linking Fisher information decay to $W_1$ convergence and offering guidance for tuning in high-dimensional Bayesian inference and related tasks.

Abstract

We derive finite-particle rates for the regularized Stein variational gradient descent (R-SVGD) algorithm introduced by He et al. (2024) that corrects the constant-order bias of the SVGD by applying a resolvent-type preconditioner to the kernelized Wasserstein gradient. For the resulting interacting $N$-particle system, we establish explicit non-asymptotic bounds for time-averaged (annealed) empirical measures, illustrating convergence in the \emph{true} (non-kernelized) Fisher information and, under a $\mathrm{W}_1\mathrm{I}$ condition on the target, corresponding $\mathrm{W}_1$ convergence for a large class of smooth kernels. Our analysis covers both continuous- and discrete-time dynamics and yields principled tuning rules for the regularization parameter, step size, and averaging horizon that quantify the trade-off between approximating the Wasserstein gradient flow and controlling finite-particle estimation error.

Finite-Particle Rates for Regularized Stein Variational Gradient Descent

TL;DR

, step size

, and averaging horizon, showing a near-optimal

rate when

remains close to 1 and a controlled trade-off when

tends to 0 (approaching the Wasserstein flow) at the cost of slower rates. Overall, the results provide kernel-agnostic (in the exponent) convergence guarantees for practical R-SVGD implementations, linking Fisher information decay to

convergence and offering guidance for tuning in high-dimensional Bayesian inference and related tasks.

Abstract

-particle system, we establish explicit non-asymptotic bounds for time-averaged (annealed) empirical measures, illustrating convergence in the \emph{true} (non-kernelized) Fisher information and, under a

condition on the target, corresponding

convergence for a large class of smooth kernels. Our analysis covers both continuous- and discrete-time dynamics and yields principled tuning rules for the regularization parameter, step size, and averaging horizon that quantify the trade-off between approximating the Wasserstein gradient flow and controlling finite-particle estimation error.

Paper Structure (18 sections, 15 theorems, 167 equations, 1 table)

This paper contains 18 sections, 15 theorems, 167 equations, 1 table.

Introduction
Preliminaries on Regularized Stein Variational Gradient Descent
Particle-based spatial discretization (R-SVGD dynamics).
R-SVGD Algorithm.
Convergence of Finite-Particle R-SVGF
Convergence of regularized Fisher information
Convergence in Fisher information and Wasserstein distance
Convergence of R-SVGD Algorithm
Notations
Proofs for Section \ref{['sec:rsvgddynamics']}
Proofs for Section \ref{['sec:rsvgdalgorithm']}
Rates under constant $h$ and $\nu$
Relation between regularized Fisher information and KSD
Convexity of regularized Fisher information
Preliminaries on Reproducing Kernel Hilbert Spaces (RKHS)
...and 3 more sections

Key Result

Theorem 1

Under Assumption assump:kernel - (1) and (2), let $\{x^i(t)\}_{i=1}^N$ be the set of $N$ particles along the R-SVGD dynamics, $p^N(t)$ be the joint distribution of $\underline{x}(t)\coloneqq(x^1(t),\cdots,x^N(t))\in \mathbb{R}^{dN}$ and $\rho^N(t)=\frac{1}{N}\sum_{j=1}^N \delta _{x^j(t)}$ be the emp where $C^*(\underline{x})$ is given in eq:C*, $p^N(0)=p_0^N$ and let $\rho^N_{av,T}\coloneqq \tfrac

Theorems & Definitions (39)

Theorem 1
Remark 1: Challenge in estimating $\mathbb{E}\lbrack C^*(\underline{x}(t))\rbrack$
Corollary 1
Remark 2
Remark 3: Matching SVGD bounds under regularization for $\nu$ close to $1$
Remark 4: Convergence rate for small $\nu$
Theorem 2
Theorem 3
Theorem 4
Remark 5
...and 29 more

Finite-Particle Rates for Regularized Stein Variational Gradient Descent

TL;DR

Abstract

Finite-Particle Rates for Regularized Stein Variational Gradient Descent

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (39)