Momentum Particle Maximum Likelihood

Jen Ning Lim; Juan Kuntz; Samuel Power; Adam M. Johansen

Momentum Particle Maximum Likelihood

Jen Ning Lim, Juan Kuntz, Samuel Power, Adam M. Johansen

TL;DR

The paper addresses latent-variable MLE by reframing the marginal likelihood optimization as free energy minimization $\mathcal{E}(\theta,q)$ on an extended space, enabling momentum-augmented dynamics. It introduces Momentum-Enriched Flow and the MPD algorithm, which injects momentum into both model parameters and latent variables, combining ideas from Nesterov acceleration and underdamped Langevin dynamics. The authors prove existence and uniqueness of the continuous-time flow, establish exponential convergence under a Log-Sobolev-type inequality, and justify the particle approximation via propagation of chaos. Empirically, MPD outperforms Particle Gradient Descent and baselines on toy hierarchies and image-generation tasks (e.g., VAEs with VampPrior), indicating practical benefits for scalable, accelerated MLE in latent-variable models.

Abstract

Maximum likelihood estimation (MLE) of latent variable models is often recast as the minimization of a free energy functional over an extended space of parameters and probability distributions. This perspective was recently combined with insights from optimal transport to obtain novel particle-based algorithms for fitting latent variable models to data. Drawing inspiration from prior works which interpret `momentum-enriched' optimization algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical-systems-inspired approach to minimizing the free energy functional. The result is a dynamical system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we prove that the continuous-time system minimizes the functional. By discretizing the system, we obtain a practical algorithm for MLE in latent variable models. The algorithm outperforms existing particle methods in numerical experiments and compares favourably with other MLE algorithms.

Momentum Particle Maximum Likelihood

TL;DR

The paper addresses latent-variable MLE by reframing the marginal likelihood optimization as free energy minimization

on an extended space, enabling momentum-augmented dynamics. It introduces Momentum-Enriched Flow and the MPD algorithm, which injects momentum into both model parameters and latent variables, combining ideas from Nesterov acceleration and underdamped Langevin dynamics. The authors prove existence and uniqueness of the continuous-time flow, establish exponential convergence under a Log-Sobolev-type inequality, and justify the particle approximation via propagation of chaos. Empirically, MPD outperforms Particle Gradient Descent and baselines on toy hierarchies and image-generation tasks (e.g., VAEs with VampPrior), indicating practical benefits for scalable, accelerated MLE in latent-variable models.

Abstract

Paper Structure (55 sections, 25 theorems, 315 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 55 sections, 25 theorems, 315 equations, 9 figures, 4 tables, 1 algorithm.

Introduction
Background
Gradient Flows on $\mathbb{R}^d$ and their Acceleration
Gradient Flows on $\mathcal{P} \left( \mathbb{R}^d \right)$ and their Acceleration
Gradient Flows on $\mathbb{R}^{d_\theta} \times \mathcal{P} \left( \mathbb{R}^{d_x} \right)$
Momentum-Enriched Flow for MLE
Convergence of Momentum-Enriched Flow
Momentum Particle Descent
Experiments
Toy Hierarchical Model
Why Accelerate Both Components?
Image Generation
Conclusion, Limitations, and Future work
Notation
Related Work
...and 40 more sections

Key Result

Proposition 3.1

Under ass:gradlip, there exists a unique strong solution to eq:mpd_flow for any initial condition $(\theta_0, m_0, q_0)$ in $\mathbb{R}^{2d_\theta}\times \mathcal{P}(\mathbb{R}^{2d_x})$.

Figures (9)

Figure 1: Toy Hierarchical Model. (a) Different regimes that arise from varying the momentum parameters; (b) comparison between our Exponential (Exp) integrator and a Nesterov-like integrator for different momentum parameters; (c) we compare the performance of the MPD with and without gradient correction.
Figure 2: Comparison of MPD with algorithms that only enrich one component. (a), (b), (c) shows the performance on the $\rm{ToyHM}(10,12)$ with different initialization of the particle cloud $\{X_0^i\}_{i=1}^M$. In (d) shows the $\hat{\sf{W}}_1$ vs iterations of a density estimation problem on a Mixture of Gaussian (MoG) dataset. MPD is shown in blue, $\theta$-only-enriched shown in green, $X$-only-enriched in purple, and PGD in black. The results are averaged over $10$ independent trials.
Figure 3: Posterior Cloud vs Epochs. We show the evolution of the reconstruction of a particle for persistent methods. The particle is taken at epoch $\{10, 20, 40\}$ on MNIST and CIFAR-10.
Figure 4: The effect of the damping parameter $\gamma_\theta$ and momentum coefficient $\mu_\theta$ on MPD.
Figure 5: MNIST. Samples generated from various algorithms.
...and 4 more figures

Theorems & Definitions (56)

Proposition 3.1: Existence and uniqueness of strong solutions to \ref{['eq:mpd_flow']}
proof
Theorem 4.1: Exponential convergence of the momentum-enriched flow
proof
Proposition 5.1
Definition 3.1: Fréchet differential on Wasserstein Space
Definition 3.2: Gradient Flow
Definition 3.3: Fréchet differential on $\mathbb{R}^{d_\theta} \times \mathcal{P}(\mathbb{R}^{d_x})$
Example 1: First Variation of $\mathcal{E}$
Proposition 4.1
...and 46 more

Momentum Particle Maximum Likelihood

TL;DR

Abstract

Momentum Particle Maximum Likelihood

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (56)