Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions

Viktor Stein; Wuchen Li

Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions

Viktor Stein, Wuchen Li

TL;DR

The paper proposes ASVGD, an accelerated SVGD variant that operates as a momentum-enabled gradient flow on the density manifold, augmented with a Stein-Wasserstein metric for stability. It proves that, for generalized bilinear kernels K(x,y) = x^T A y + 1 and Gaussian targets, the dynamics preserve Gaussianity and derive an A-optimal, parameter-free convergence rate depending on the target covariance’s condition number via sqrt(\\kappa(Q)). It also identifies an optimal damping constant independent of the smallest eigenvalue of Q, yielding strong asymptotic convergence guarantees, and demonstrates through simulations that ASVGD outperforms SVGD and other samplers on toy problems and Bayesian neural networks. These results suggest that acceleration in the density space can significantly improve sampling efficiency for high-dimensional Bayesian inference tasks, with practical impact on neural networks and related probabilistic modeling scenarios.

Abstract

Stein variational gradient descent (SVGD) is a kernel-based and non-parametric particle method for sampling from a target distribution, such as in Bayesian inference and other machine learning tasks. Different from other particle methods, SVGD does not require estimating the score, which is the gradient of the log-density. However, in practice, SVGD can be slow compared to score-estimation-based sampling algorithms. To design a fast and efficient high-dimensional sampling algorithm with the advantages of SVGD, we introduce accelerated SVGD (ASVGD), based on an accelerated gradient flow in a metric space of probability densities following Nesterov's method. We then derive a momentum-based discrete-time sampling algorithm, which evolves a set of particles deterministically. To stabilize the particles' position update, we also include a Wasserstein metric regularization. This paper extends the conference version \cite{SL2025}. For the bilinear kernel and Gaussian target distributions, we study the kernel parameter and damping parameters with an optimal convergence rate of the proposed dynamics. This is achieved by analyzing the linearized accelerated gradient flows at the equilibrium. Interestingly, the optimal parameter is a constant, which does not depend on the covariance of the target distribution. For the generalized kernel functions, such as the Gaussian kernel, numerical examples with varied target distributions demonstrate the effectiveness of ASVGD compared to SVGD and other popular sampling methods. Furthermore, we show that in the setting of Bayesian neural networks, ASVGD outperforms SVGD significantly in terms of log-likelihood and total iteration times.

Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions

TL;DR

Abstract

Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (59)