Table of Contents
Fetching ...

Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation

Adam Rozzio, Rafael Athanasiades, O. Deniz Akyildiz

TL;DR

An accelerated version of one such procedure, based on Stein variational gradient descent (SVGD), is proposed, by introducing Nesterov acceleration in both the parameter updates and in the space of probability measures, which consistently accelerates convergence in terms of required iterations across various tasks of increasing difficulty.

Abstract

Maximum marginal likelihood estimation (MMLE) can be formulated as the optimization of a free energy functional. From this viewpoint, the Expectation-Maximisation (EM) algorithm admits a natural interpretation as a coordinate descent method over the joint space of model parameters and probability measures. Recently, a significant body of work has adopted this perspective, leading to interacting particle algorithms for MMLE. In this paper, we propose an accelerated version of one such procedure, based on Stein variational gradient descent (SVGD), by introducing Nesterov acceleration in both the parameter updates and in the space of probability measures. The resulting method, termed Momentum SVGD-EM, consistently accelerates convergence in terms of required iterations across various tasks of increasing difficulty, demonstrating effectiveness in both low- and high-dimensional settings.

Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation

TL;DR

An accelerated version of one such procedure, based on Stein variational gradient descent (SVGD), is proposed, by introducing Nesterov acceleration in both the parameter updates and in the space of probability measures, which consistently accelerates convergence in terms of required iterations across various tasks of increasing difficulty.

Abstract

Maximum marginal likelihood estimation (MMLE) can be formulated as the optimization of a free energy functional. From this viewpoint, the Expectation-Maximisation (EM) algorithm admits a natural interpretation as a coordinate descent method over the joint space of model parameters and probability measures. Recently, a significant body of work has adopted this perspective, leading to interacting particle algorithms for MMLE. In this paper, we propose an accelerated version of one such procedure, based on Stein variational gradient descent (SVGD), by introducing Nesterov acceleration in both the parameter updates and in the space of probability measures. The resulting method, termed Momentum SVGD-EM, consistently accelerates convergence in terms of required iterations across various tasks of increasing difficulty, demonstrating effectiveness in both low- and high-dimensional settings.
Paper Structure (26 sections, 25 equations, 22 figures, 1 table, 2 algorithms)

This paper contains 26 sections, 25 equations, 22 figures, 1 table, 2 algorithms.

Figures (22)

  • Figure 1: Comparison of M-SVGD-EM with the SVGD-EM. Part (a) shows the performance on parameter estimation on a single iteration on the ToyHM(10, 12) with different acceleration parameters $\alpha_{\theta}=\alpha_{X}=(0.3,0.5,0.9)$.(b) displays MSE with respect to the empirical minimizer for each algorithm. (c) shows the average number of iterations until convergence, defined as being within a threshold of 0.05 from the empirical minimizer. The results are averaged over 20 independent trials.
  • Figure 2: (a) Parameter estimation convergence on BLR varying acceleration. (b) Posterior distributions across test features, showing acceleration improves convergence. (c) Test error vs. acceleration for different $t$ values, showing error reduction with increased acceleration.
  • Figure 3: Different Initilisations of $\alpha$ and $\beta$ for accelerations $(0,0.3,0.9)$ with $N=10$ for $T=500$.
  • Figure 4: Test Error w.r.t iterations $t$ for the BNN on MNIST dataset
  • Figure 5: Log predictive probability distribution vs iterations $t$ for the BNN on MNIST dataset
  • ...and 17 more figures