Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation

Adam Rozzio; Rafael Athanasiades; O. Deniz Akyildiz

Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation

Adam Rozzio, Rafael Athanasiades, O. Deniz Akyildiz

TL;DR

An accelerated version of one such procedure, based on Stein variational gradient descent (SVGD), is proposed, by introducing Nesterov acceleration in both the parameter updates and in the space of probability measures, which consistently accelerates convergence in terms of required iterations across various tasks of increasing difficulty.

Abstract

Maximum marginal likelihood estimation (MMLE) can be formulated as the optimization of a free energy functional. From this viewpoint, the Expectation-Maximisation (EM) algorithm admits a natural interpretation as a coordinate descent method over the joint space of model parameters and probability measures. Recently, a significant body of work has adopted this perspective, leading to interacting particle algorithms for MMLE. In this paper, we propose an accelerated version of one such procedure, based on Stein variational gradient descent (SVGD), by introducing Nesterov acceleration in both the parameter updates and in the space of probability measures. The resulting method, termed Momentum SVGD-EM, consistently accelerates convergence in terms of required iterations across various tasks of increasing difficulty, demonstrating effectiveness in both low- and high-dimensional settings.

Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation

TL;DR

Abstract

Paper Structure (26 sections, 25 equations, 22 figures, 1 table, 2 algorithms)

This paper contains 26 sections, 25 equations, 22 figures, 1 table, 2 algorithms.

Introduction
Background
A variational view of EM
Euclidean-Wasserstein gradient flow of
SVGD-EM
Momentum SVGD-EM
Acceleration in
Acceleration in
Momentum SVGD-EM
Numerical experiments
Toy Hierarchical Model
Bayesian Logistic Regression
Bayesian Neural Network MNIST
Conclusion
Derivation of SVGD-WNes from SVGD-RAGD
...and 11 more sections

Figures (22)

Figure 1: Comparison of M-SVGD-EM with the SVGD-EM. Part (a) shows the performance on parameter estimation on a single iteration on the ToyHM(10, 12) with different acceleration parameters $\alpha_{\theta}=\alpha_{X}=(0.3,0.5,0.9)$.(b) displays MSE with respect to the empirical minimizer for each algorithm. (c) shows the average number of iterations until convergence, defined as being within a threshold of 0.05 from the empirical minimizer. The results are averaged over 20 independent trials.
Figure 2: (a) Parameter estimation convergence on BLR varying acceleration. (b) Posterior distributions across test features, showing acceleration improves convergence. (c) Test error vs. acceleration for different $t$ values, showing error reduction with increased acceleration.
Figure 3: Different Initilisations of $\alpha$ and $\beta$ for accelerations $(0,0.3,0.9)$ with $N=10$ for $T=500$.
Figure 4: Test Error w.r.t iterations $t$ for the BNN on MNIST dataset
Figure 5: Log predictive probability distribution vs iterations $t$ for the BNN on MNIST dataset
...and 17 more figures

Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation

TL;DR

Abstract

Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (22)