Table of Contents
Fetching ...

Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

Rocco Caprio, Adam M Johansen

TL;DR

This work analyzes the Expectation Maximization algorithm for latent-variable models through a Euclidean–Wasserstein gradient-flow lens, casting EM as alternating minimization of a free energy $F(\theta,q)$ on the product space $\mathcal{M}_2=\mathbb{R}^{d_\theta}\times\mathcal{P}_2(\mathbb{R}^{d_x})$. Under a smoothness condition and an extended log-Sobolev inequality with constant $\lambda$, the authors derive non-asymptotic exponential convergence for the free energy and, via an extended Talagrand inequality, for the EM iterates themselves, with rates governed by $\lambda$ and the Lipschitz constants. The paper also analyzes several EM variants—first-order EM, Langevin EM, alternating gradient descent, and full gradient descent—providing corresponding non-asymptotic bounds and highlighting a hierarchy where vanilla EM often converges fastest in iteration count for the considered models. The approach yields a unified framework for EM convergence in continuous latent spaces and suggests extensions to slower or local convergence regimes via weaker functional-inequality analogues. Limitations include reliance on the extended LS inequality (holding only for well-behaved hierarchical models) and focus on continuous latent spaces; the authors discuss potential extensions to non-smooth settings and alternative geometries, as well as Monte Carlo approximations.

Abstract

We present a new framework for analysing the Expectation Maximization (EM) algorithm. Drawing on recent advances in the theory of gradient flows over Euclidean-Wasserstein spaces, we extend techniques from alternating minimization in Euclidean spaces to the EM algorithm, via its representation as coordinate-wise minimization of the free energy. In so doing, we obtain finite sample error bounds and exponential convergence of the EM algorithm under a natural generalisation of the log-Sobolev inequality. We further show that this framework naturally extends to several variants of EM, offering a unified approach for studying such algorithms.

Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

TL;DR

This work analyzes the Expectation Maximization algorithm for latent-variable models through a Euclidean–Wasserstein gradient-flow lens, casting EM as alternating minimization of a free energy on the product space . Under a smoothness condition and an extended log-Sobolev inequality with constant , the authors derive non-asymptotic exponential convergence for the free energy and, via an extended Talagrand inequality, for the EM iterates themselves, with rates governed by and the Lipschitz constants. The paper also analyzes several EM variants—first-order EM, Langevin EM, alternating gradient descent, and full gradient descent—providing corresponding non-asymptotic bounds and highlighting a hierarchy where vanilla EM often converges fastest in iteration count for the considered models. The approach yields a unified framework for EM convergence in continuous latent spaces and suggests extensions to slower or local convergence regimes via weaker functional-inequality analogues. Limitations include reliance on the extended LS inequality (holding only for well-behaved hierarchical models) and focus on continuous latent spaces; the authors discuss potential extensions to non-smooth settings and alternative geometries, as well as Monte Carlo approximations.

Abstract

We present a new framework for analysing the Expectation Maximization (EM) algorithm. Drawing on recent advances in the theory of gradient flows over Euclidean-Wasserstein spaces, we extend techniques from alternating minimization in Euclidean spaces to the EM algorithm, via its representation as coordinate-wise minimization of the free energy. In so doing, we obtain finite sample error bounds and exponential convergence of the EM algorithm under a natural generalisation of the log-Sobolev inequality. We further show that this framework naturally extends to several variants of EM, offering a unified approach for studying such algorithms.
Paper Structure (24 sections, 26 theorems, 110 equations, 1 figure)

This paper contains 24 sections, 26 theorems, 110 equations, 1 figure.

Key Result

Proposition 1

The steps of the EM iterations are equivalent to

Figures (1)

  • Figure 1: EM and its variants' free energy and their bounds

Theorems & Definitions (47)

  • Proposition 1
  • Definition 1: Extended log-Sobolev inequality
  • Definition 2: Extension of the Talagrand inequality
  • Theorem 2: Theorem 2 in Caprio2025
  • Lemma 3
  • proof
  • Proposition 4
  • proof
  • Corollary 5
  • Lemma 6: Descent lemma on $\mathcal{P}_2(\mathbb{R}^{d_x})$
  • ...and 37 more