Table of Contents
Fetching ...

Finite-Particle Rates for Regularized Stein Variational Gradient Descent

Ye He, Krishnakumar Balasubramanian, Sayan Banerjee, Promit Ghosal

TL;DR

This work develops non-asymptotic, finite-particle rates for Regularized-Stein Variational Gradient Descent (R-SVGD), a resolvent-preconditioned variant of SVGD designed to debias kernel-induced bias and better approximate the Wasserstein gradient flow. By deriving both continuous- and discrete-time analyses, the authors show time-averaged Fisher information decay and, under a transport-information inequality, convergence in Wasserstein-1 for annealed measures, with explicit bounds that separate initialization, finite-particle error, and regularization effects. They reveal a principled tuning regime for the regularization parameter $\nu$, step size $h$, and averaging horizon, showing a near-optimal $N^{-1}$ rate when $\nu$ remains close to 1 and a controlled trade-off when $\nu$ tends to 0 (approaching the Wasserstein flow) at the cost of slower rates. Overall, the results provide kernel-agnostic (in the exponent) convergence guarantees for practical R-SVGD implementations, linking Fisher information decay to $W_1$ convergence and offering guidance for tuning in high-dimensional Bayesian inference and related tasks.

Abstract

We derive finite-particle rates for the regularized Stein variational gradient descent (R-SVGD) algorithm introduced by He et al. (2024) that corrects the constant-order bias of the SVGD by applying a resolvent-type preconditioner to the kernelized Wasserstein gradient. For the resulting interacting $N$-particle system, we establish explicit non-asymptotic bounds for time-averaged (annealed) empirical measures, illustrating convergence in the \emph{true} (non-kernelized) Fisher information and, under a $\mathrm{W}_1\mathrm{I}$ condition on the target, corresponding $\mathrm{W}_1$ convergence for a large class of smooth kernels. Our analysis covers both continuous- and discrete-time dynamics and yields principled tuning rules for the regularization parameter, step size, and averaging horizon that quantify the trade-off between approximating the Wasserstein gradient flow and controlling finite-particle estimation error.

Finite-Particle Rates for Regularized Stein Variational Gradient Descent

TL;DR

This work develops non-asymptotic, finite-particle rates for Regularized-Stein Variational Gradient Descent (R-SVGD), a resolvent-preconditioned variant of SVGD designed to debias kernel-induced bias and better approximate the Wasserstein gradient flow. By deriving both continuous- and discrete-time analyses, the authors show time-averaged Fisher information decay and, under a transport-information inequality, convergence in Wasserstein-1 for annealed measures, with explicit bounds that separate initialization, finite-particle error, and regularization effects. They reveal a principled tuning regime for the regularization parameter , step size , and averaging horizon, showing a near-optimal rate when remains close to 1 and a controlled trade-off when tends to 0 (approaching the Wasserstein flow) at the cost of slower rates. Overall, the results provide kernel-agnostic (in the exponent) convergence guarantees for practical R-SVGD implementations, linking Fisher information decay to convergence and offering guidance for tuning in high-dimensional Bayesian inference and related tasks.

Abstract

We derive finite-particle rates for the regularized Stein variational gradient descent (R-SVGD) algorithm introduced by He et al. (2024) that corrects the constant-order bias of the SVGD by applying a resolvent-type preconditioner to the kernelized Wasserstein gradient. For the resulting interacting -particle system, we establish explicit non-asymptotic bounds for time-averaged (annealed) empirical measures, illustrating convergence in the \emph{true} (non-kernelized) Fisher information and, under a condition on the target, corresponding convergence for a large class of smooth kernels. Our analysis covers both continuous- and discrete-time dynamics and yields principled tuning rules for the regularization parameter, step size, and averaging horizon that quantify the trade-off between approximating the Wasserstein gradient flow and controlling finite-particle estimation error.
Paper Structure (18 sections, 15 theorems, 167 equations, 1 table)

This paper contains 18 sections, 15 theorems, 167 equations, 1 table.

Key Result

Theorem 1

Under Assumption assump:kernel - (1) and (2), let $\{x^i(t)\}_{i=1}^N$ be the set of $N$ particles along the R-SVGD dynamics, $p^N(t)$ be the joint distribution of $\underline{x}(t)\coloneqq(x^1(t),\cdots,x^N(t))\in \mathbb{R}^{dN}$ and $\rho^N(t)=\frac{1}{N}\sum_{j=1}^N \delta _{x^j(t)}$ be the emp where $C^*(\underline{x})$ is given in eq:C*, $p^N(0)=p_0^N$ and let $\rho^N_{av,T}\coloneqq \tfrac

Theorems & Definitions (39)

  • Theorem 1
  • Remark 1: Challenge in estimating $\mathbb{E}\lbrack C^*(\underline{x}(t))\rbrack$
  • Corollary 1
  • Remark 2
  • Remark 3: Matching SVGD bounds under regularization for $\nu$ close to $1$
  • Remark 4: Convergence rate for small $\nu$
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Remark 5
  • ...and 29 more