Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality
Rocco Caprio, Adam M Johansen
TL;DR
This work analyzes the Expectation Maximization algorithm for latent-variable models through a Euclidean–Wasserstein gradient-flow lens, casting EM as alternating minimization of a free energy $F(\theta,q)$ on the product space $\mathcal{M}_2=\mathbb{R}^{d_\theta}\times\mathcal{P}_2(\mathbb{R}^{d_x})$. Under a smoothness condition and an extended log-Sobolev inequality with constant $\lambda$, the authors derive non-asymptotic exponential convergence for the free energy and, via an extended Talagrand inequality, for the EM iterates themselves, with rates governed by $\lambda$ and the Lipschitz constants. The paper also analyzes several EM variants—first-order EM, Langevin EM, alternating gradient descent, and full gradient descent—providing corresponding non-asymptotic bounds and highlighting a hierarchy where vanilla EM often converges fastest in iteration count for the considered models. The approach yields a unified framework for EM convergence in continuous latent spaces and suggests extensions to slower or local convergence regimes via weaker functional-inequality analogues. Limitations include reliance on the extended LS inequality (holding only for well-behaved hierarchical models) and focus on continuous latent spaces; the authors discuss potential extensions to non-smooth settings and alternative geometries, as well as Monte Carlo approximations.
Abstract
We present a new framework for analysing the Expectation Maximization (EM) algorithm. Drawing on recent advances in the theory of gradient flows over Euclidean-Wasserstein spaces, we extend techniques from alternating minimization in Euclidean spaces to the EM algorithm, via its representation as coordinate-wise minimization of the free energy. In so doing, we obtain finite sample error bounds and exponential convergence of the EM algorithm under a natural generalisation of the log-Sobolev inequality. We further show that this framework naturally extends to several variants of EM, offering a unified approach for studying such algorithms.
