Momentum Particle Maximum Likelihood
Jen Ning Lim, Juan Kuntz, Samuel Power, Adam M. Johansen
TL;DR
The paper addresses latent-variable MLE by reframing the marginal likelihood optimization as free energy minimization $\mathcal{E}(\theta,q)$ on an extended space, enabling momentum-augmented dynamics. It introduces Momentum-Enriched Flow and the MPD algorithm, which injects momentum into both model parameters and latent variables, combining ideas from Nesterov acceleration and underdamped Langevin dynamics. The authors prove existence and uniqueness of the continuous-time flow, establish exponential convergence under a Log-Sobolev-type inequality, and justify the particle approximation via propagation of chaos. Empirically, MPD outperforms Particle Gradient Descent and baselines on toy hierarchies and image-generation tasks (e.g., VAEs with VampPrior), indicating practical benefits for scalable, accelerated MLE in latent-variable models.
Abstract
Maximum likelihood estimation (MLE) of latent variable models is often recast as the minimization of a free energy functional over an extended space of parameters and probability distributions. This perspective was recently combined with insights from optimal transport to obtain novel particle-based algorithms for fitting latent variable models to data. Drawing inspiration from prior works which interpret `momentum-enriched' optimization algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical-systems-inspired approach to minimizing the free energy functional. The result is a dynamical system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we prove that the continuous-time system minimizes the functional. By discretizing the system, we obtain a practical algorithm for MLE in latent variable models. The algorithm outperforms existing particle methods in numerical experiments and compares favourably with other MLE algorithms.
