Table of Contents
Fetching ...

Momentum Particle Maximum Likelihood

Jen Ning Lim, Juan Kuntz, Samuel Power, Adam M. Johansen

TL;DR

The paper addresses latent-variable MLE by reframing the marginal likelihood optimization as free energy minimization $\mathcal{E}(\theta,q)$ on an extended space, enabling momentum-augmented dynamics. It introduces Momentum-Enriched Flow and the MPD algorithm, which injects momentum into both model parameters and latent variables, combining ideas from Nesterov acceleration and underdamped Langevin dynamics. The authors prove existence and uniqueness of the continuous-time flow, establish exponential convergence under a Log-Sobolev-type inequality, and justify the particle approximation via propagation of chaos. Empirically, MPD outperforms Particle Gradient Descent and baselines on toy hierarchies and image-generation tasks (e.g., VAEs with VampPrior), indicating practical benefits for scalable, accelerated MLE in latent-variable models.

Abstract

Maximum likelihood estimation (MLE) of latent variable models is often recast as the minimization of a free energy functional over an extended space of parameters and probability distributions. This perspective was recently combined with insights from optimal transport to obtain novel particle-based algorithms for fitting latent variable models to data. Drawing inspiration from prior works which interpret `momentum-enriched' optimization algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical-systems-inspired approach to minimizing the free energy functional. The result is a dynamical system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we prove that the continuous-time system minimizes the functional. By discretizing the system, we obtain a practical algorithm for MLE in latent variable models. The algorithm outperforms existing particle methods in numerical experiments and compares favourably with other MLE algorithms.

Momentum Particle Maximum Likelihood

TL;DR

The paper addresses latent-variable MLE by reframing the marginal likelihood optimization as free energy minimization on an extended space, enabling momentum-augmented dynamics. It introduces Momentum-Enriched Flow and the MPD algorithm, which injects momentum into both model parameters and latent variables, combining ideas from Nesterov acceleration and underdamped Langevin dynamics. The authors prove existence and uniqueness of the continuous-time flow, establish exponential convergence under a Log-Sobolev-type inequality, and justify the particle approximation via propagation of chaos. Empirically, MPD outperforms Particle Gradient Descent and baselines on toy hierarchies and image-generation tasks (e.g., VAEs with VampPrior), indicating practical benefits for scalable, accelerated MLE in latent-variable models.

Abstract

Maximum likelihood estimation (MLE) of latent variable models is often recast as the minimization of a free energy functional over an extended space of parameters and probability distributions. This perspective was recently combined with insights from optimal transport to obtain novel particle-based algorithms for fitting latent variable models to data. Drawing inspiration from prior works which interpret `momentum-enriched' optimization algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical-systems-inspired approach to minimizing the free energy functional. The result is a dynamical system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we prove that the continuous-time system minimizes the functional. By discretizing the system, we obtain a practical algorithm for MLE in latent variable models. The algorithm outperforms existing particle methods in numerical experiments and compares favourably with other MLE algorithms.
Paper Structure (55 sections, 25 theorems, 315 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 55 sections, 25 theorems, 315 equations, 9 figures, 4 tables, 1 algorithm.

Key Result

Proposition 3.1

Under ass:gradlip, there exists a unique strong solution to eq:mpd_flow for any initial condition $(\theta_0, m_0, q_0)$ in $\mathbb{R}^{2d_\theta}\times \mathcal{P}(\mathbb{R}^{2d_x})$.

Figures (9)

  • Figure 1: Toy Hierarchical Model. (a) Different regimes that arise from varying the momentum parameters; (b) comparison between our Exponential (Exp) integrator and a Nesterov-like integrator for different momentum parameters; (c) we compare the performance of the MPD with and without gradient correction.
  • Figure 2: Comparison of MPD with algorithms that only enrich one component. (a), (b), (c) shows the performance on the $\rm{ToyHM}(10,12)$ with different initialization of the particle cloud $\{X_0^i\}_{i=1}^M$. In (d) shows the $\hat{\sf{W}}_1$ vs iterations of a density estimation problem on a Mixture of Gaussian (MoG) dataset. MPD is shown in blue, $\theta$-only-enriched shown in green, $X$-only-enriched in purple, and PGD in black. The results are averaged over $10$ independent trials.
  • Figure 3: Posterior Cloud vs Epochs. We show the evolution of the reconstruction of a particle for persistent methods. The particle is taken at epoch $\{10, 20, 40\}$ on MNIST and CIFAR-10.
  • Figure 4: The effect of the damping parameter $\gamma_\theta$ and momentum coefficient $\mu_\theta$ on MPD.
  • Figure 5: MNIST. Samples generated from various algorithms.
  • ...and 4 more figures

Theorems & Definitions (56)

  • Proposition 3.1: Existence and uniqueness of strong solutions to \ref{['eq:mpd_flow']}
  • proof
  • Theorem 4.1: Exponential convergence of the momentum-enriched flow
  • proof
  • Proposition 5.1
  • Definition 3.1: Fréchet differential on Wasserstein Space
  • Definition 3.2: Gradient Flow
  • Definition 3.3: Fréchet differential on $\mathbb{R}^{d_\theta} \times \mathcal{P}(\mathbb{R}^{d_x})$
  • Example 1: First Variation of $\mathcal{E}$
  • Proposition 4.1
  • ...and 46 more