The Stein-log-Sobolev inequality and the exponential rate of convergence for the continuous Stein variational gradient descent method
José A. Carrillo, Jakub Skrzeczkowski, Jethro Warnett
TL;DR
This work establishes a rigorous foundation for exponential convergence of the continuous Stein variational gradient descent by proving the Stein-log-Sobolev inequality (SLSI) for a broad class of kernels. A central innovation is interpreting the Stein-Fisher information as a duality pairing between $H^{-1}(\mathbb{R}^d)$ and $H^{1}(\mathbb{R}^d)$ and leveraging Fourier analysis to derive bounds via a kernel ansatz $K(x,y)=e^{V(x)-V_0(x)/2}\,k(x-y)\,e^{V(y)-V_0(y)/2}$. The paper constructs kernels whose Fourier transforms have quadratic decay and provides explicit constants for the SLSI, while also proving the existence of weak solutions to the mean-field Stein gradient flow with exponential decay toward equilibrium. It further delineates conditions under which SLSI can fail, highlighting the necessity of the proposed assumptions and offering guidance for kernel design in SVGD. Overall, the results deliver a rigorous, quantitative pathway to exponential convergence in continuous SVGD and clarify the role of kernel choice in achieving this behavior.
Abstract
The Stein Variational Gradient Descent method is a variational inference method in statistics that has recently received a lot of attention. The method provides a deterministic approximation of the target distribution, by introducing a nonlocal interaction with a kernel. Despite the significant interest, the exponential rate of convergence for the continuous method has remained an open problem, due to the difficulty of establishing the related so-called Stein-log-Sobolev inequality. Here, we prove that the inequality is satisfied for each space dimension and every kernel whose Fourier transform has a quadratic decay at infinity and is locally bounded away from zero and infinity. Moreover, we construct weak solutions to the related PDE satisfying exponential rate of decay towards the equilibrium. The main novelty in our approach is to interpret the Stein-Fisher information, also called the squared Stein discrepancy, as a duality pairing between $H^{-1}(\mathbb{R}^d)$ and $H^{1}(\mathbb{R}^d)$, which allows us to employ the Fourier transform. We also provide several examples of kernels for which the Stein-log-Sobolev inequality fails, partially showing the necessity of our assumptions.
