Accelerating Convergence of Stein Variational Gradient Descent via Deep Unfolding

Yuya Kawamura; Satoshi Takabe

Accelerating Convergence of Stein Variational Gradient Descent via Deep Unfolding

Yuya Kawamura, Satoshi Takabe

TL;DR

This paper proposes two novel trainable algorithms based on SVGD: deep-unfolded SVGD (DUSVGD) and Chebyshev-step based DUSVGD (C-DUSVGD), which incorporate a deep-learning technique called deep unfolding into SVGD by embedding trainable parameters.

Abstract

Stein variational gradient descent (SVGD) is a prominent particle-based variational inference method used for sampling a target distribution. SVGD has attracted interest for application in machine-learning techniques such as Bayesian inference. In this paper, we propose novel trainable algorithms that incorporate a deep-learning technique called deep unfolding,into SVGD. This approach facilitates the learning of the internal parameters of SVGD, thereby accelerating its convergence speed. To evaluate the proposed trainable SVGD algorithms, we conducted numerical simulations of three tasks: sampling a one-dimensional Gaussian mixture, performing Bayesian logistic regression, and learning Bayesian neural networks. The results show that our proposed algorithms exhibit faster convergence than the conventional variants of SVGD.

Accelerating Convergence of Stein Variational Gradient Descent via Deep Unfolding

TL;DR

Abstract

Paper Structure (13 sections, 16 equations, 5 figures, 2 algorithms)

This paper contains 13 sections, 16 equations, 5 figures, 2 algorithms.

Introduction
Preliminaries
Stein Variational Gradient Descent
Deep Unfolding
Chebyshev Step
Proposed Method
DUSVGD
C-DUSVGD
Numerical Experiments
Sampling of Gaussian Mixture Distribution
Bayesian Logistic Regression
Learning Bayesian Neural Networks
Conclusion

Figures (5)

Figure 1: Structure of DUSVGD.
Figure 2: Dependency of MMD on the number of iterations for DUSVGD, C-DUSVGD, SVGD with RMSProp, and SVGD with a fixed step size in sampling a one-dimensional Gaussian mixture distribution.
Figure 3: Distributions of particles of (C)-DUSVGD and SVGD algorithms after $100$ iterations in approximating a one-dimensional Gaussian mixture distribution. Each distribution is obtained via kernel density estimation with an RBF kernel of bandwidth $0.2$. The target distribution (\ref{['px']}) is represented by the solid line.
Figure 4: Dependency of accuracy on the number of iterations for DUSVGD, C-DUSVGD, RMSProp, and fixed step size in the Bayesian logistic regression problem.
Figure 5: Dependency of the mean squared error on the number of iterations for DUSVGD, RMSProp, and fixed step size in Bayesian neural networks.

Accelerating Convergence of Stein Variational Gradient Descent via Deep Unfolding

TL;DR

Abstract

Accelerating Convergence of Stein Variational Gradient Descent via Deep Unfolding

Authors

TL;DR

Abstract

Table of Contents

Figures (5)