Table of Contents
Fetching ...

Multirate Stein Variational Gradient Descent for Efficient Bayesian Sampling

Arash Sarshar

Abstract

Many particle-based Bayesian inference methods use a single global step size for all parts of the update. In Stein variational gradient descent (SVGD), however, each update combines two qualitatively different effects: attraction toward high-posterior regions and repulsion that preserves particle diversity. These effects can evolve at different rates, especially in high-dimensional, anisotropic, or hierarchical posteriors, so one step size can be unstable in some regions and inefficient in others. We derive a multirate version of SVGD that updates these components on different time scales. The framework yields practical algorithms, including a symmetric split method, a fixed multirate method (MR-SVGD), and an adaptive multirate method (Adapt-MR-SVGD) with local error control. We evaluate the methods in a broad and rigorous benchmark suite covering six problem families: a 50D Gaussian target, multiple 2D synthetic targets, UCI Bayesian logistic regression, multimodal Gaussian mixtures, Bayesian neural networks, and large-scale hierarchical logistic regression. Evaluation includes posterior-matching metrics, predictive performance, calibration quality, mixing, and explicit computational cost accounting. Across these six benchmark families, multirate SVGD variants improve robustness and quality-cost tradeoffs relative to vanilla SVGD. The strongest gains appear on stiff hierarchical, strongly anisotropic, and multimodal targets, where adaptive multirate SVGD is usually the strongest variant and fixed multirate SVGD provides a simpler robust alternative at lower cost.

Multirate Stein Variational Gradient Descent for Efficient Bayesian Sampling

Abstract

Many particle-based Bayesian inference methods use a single global step size for all parts of the update. In Stein variational gradient descent (SVGD), however, each update combines two qualitatively different effects: attraction toward high-posterior regions and repulsion that preserves particle diversity. These effects can evolve at different rates, especially in high-dimensional, anisotropic, or hierarchical posteriors, so one step size can be unstable in some regions and inefficient in others. We derive a multirate version of SVGD that updates these components on different time scales. The framework yields practical algorithms, including a symmetric split method, a fixed multirate method (MR-SVGD), and an adaptive multirate method (Adapt-MR-SVGD) with local error control. We evaluate the methods in a broad and rigorous benchmark suite covering six problem families: a 50D Gaussian target, multiple 2D synthetic targets, UCI Bayesian logistic regression, multimodal Gaussian mixtures, Bayesian neural networks, and large-scale hierarchical logistic regression. Evaluation includes posterior-matching metrics, predictive performance, calibration quality, mixing, and explicit computational cost accounting. Across these six benchmark families, multirate SVGD variants improve robustness and quality-cost tradeoffs relative to vanilla SVGD. The strongest gains appear on stiff hierarchical, strongly anisotropic, and multimodal targets, where adaptive multirate SVGD is usually the strongest variant and fixed multirate SVGD provides a simpler robust alternative at lower cost.

Paper Structure

This paper contains 31 sections, 23 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: 50D Gaussian final-checkpoint summary across methods. The four panels report $\|\hat{\mu}-\mu\|_2$, $\|\hat{\Sigma}-\Sigma\|_F$, KSD, and ESS, respectively. Markers show mean values across seeds and error bars indicate one standard deviation. Lower is better for the first three metrics, whereas higher is better for ESS. This figure highlights robustness differences under strong anisotropy: Adapt-MR-SVGD is the only particle variant that keeps both moment errors and KSD under control.
  • Figure 2: 50D Gaussian: final mean $\pm$ std Pareto plot in moment-error space with marker size encoding ESS (left) and wall time (right).
  • Figure 3: 2D target visualization panels produced with Adapt-MR-SVGD under visualization-only settings. Each subfigure shows target-density contours with initial particles (left) and short-run final particles (right).
  • Figure 4: Mixture2D (mix8) fixed-budget final-checkpoint summary across methods. The four panels report mode coverage, mode entropy, mode imbalance, and KSD. Higher is better for coverage and entropy, while lower is better for mode imbalance and KSD.
  • Figure 5: UCI logistic regression summary across datasets. Each subpanel reports test accuracy, NLL, ECE, and ESS at the best finite-NLL checkpoint selected by the predictive stopping rule. Remaining datasets are shown in the continued panel.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2