Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

Rocco Caprio; Adam M Johansen

Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

Rocco Caprio, Adam M Johansen

TL;DR

This work analyzes the Expectation Maximization algorithm for latent-variable models through a Euclidean–Wasserstein gradient-flow lens, casting EM as alternating minimization of a free energy $F(\theta,q)$ on the product space $\mathcal{M}_2=\mathbb{R}^{d_\theta}\times\mathcal{P}_2(\mathbb{R}^{d_x})$. Under a smoothness condition and an extended log-Sobolev inequality with constant $\lambda$, the authors derive non-asymptotic exponential convergence for the free energy and, via an extended Talagrand inequality, for the EM iterates themselves, with rates governed by $\lambda$ and the Lipschitz constants. The paper also analyzes several EM variants—first-order EM, Langevin EM, alternating gradient descent, and full gradient descent—providing corresponding non-asymptotic bounds and highlighting a hierarchy where vanilla EM often converges fastest in iteration count for the considered models. The approach yields a unified framework for EM convergence in continuous latent spaces and suggests extensions to slower or local convergence regimes via weaker functional-inequality analogues. Limitations include reliance on the extended LS inequality (holding only for well-behaved hierarchical models) and focus on continuous latent spaces; the authors discuss potential extensions to non-smooth settings and alternative geometries, as well as Monte Carlo approximations.

Abstract

We present a new framework for analysing the Expectation Maximization (EM) algorithm. Drawing on recent advances in the theory of gradient flows over Euclidean-Wasserstein spaces, we extend techniques from alternating minimization in Euclidean spaces to the EM algorithm, via its representation as coordinate-wise minimization of the free energy. In so doing, we obtain finite sample error bounds and exponential convergence of the EM algorithm under a natural generalisation of the log-Sobolev inequality. We further show that this framework naturally extends to several variants of EM, offering a unified approach for studying such algorithms.

Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

TL;DR

This work analyzes the Expectation Maximization algorithm for latent-variable models through a Euclidean–Wasserstein gradient-flow lens, casting EM as alternating minimization of a free energy

on the product space

. Under a smoothness condition and an extended log-Sobolev inequality with constant

, the authors derive non-asymptotic exponential convergence for the free energy and, via an extended Talagrand inequality, for the EM iterates themselves, with rates governed by

and the Lipschitz constants. The paper also analyzes several EM variants—first-order EM, Langevin EM, alternating gradient descent, and full gradient descent—providing corresponding non-asymptotic bounds and highlighting a hierarchy where vanilla EM often converges fastest in iteration count for the considered models. The approach yields a unified framework for EM convergence in continuous latent spaces and suggests extensions to slower or local convergence regimes via weaker functional-inequality analogues. Limitations include reliance on the extended LS inequality (holding only for well-behaved hierarchical models) and focus on continuous latent spaces; the authors discuss potential extensions to non-smooth settings and alternative geometries, as well as Monte Carlo approximations.

Abstract

Paper Structure (24 sections, 26 theorems, 110 equations, 1 figure)

This paper contains 24 sections, 26 theorems, 110 equations, 1 figure.

Introduction
The Expectation Maximization Algorithm
Relevant literature
A differential analysis of the EM Algorithm
Non-asymptotic analysis of EM and related algorithms
EM Algorithm
First-order EM
Langevin EM
Gradient Descents
Discussion
Notation
Riemannian structure of $\mathcal{M}_2$
On the extended log-Sobolev inequality
Bakry--Émery and strongly log-concave models
Operations preserving the extended log-Sobolev inequality and models with different completions
...and 9 more sections

Key Result

Proposition 1

The steps of the EM iterations are equivalent to

Figures (1)

Figure 1: EM and its variants' free energy and their bounds

Theorems & Definitions (47)

Proposition 1
Definition 1: Extended log-Sobolev inequality
Definition 2: Extension of the Talagrand inequality
Theorem 2: Theorem 2 in Caprio2025
Lemma 3
proof
Proposition 4
proof
Corollary 5
Lemma 6: Descent lemma on $\mathcal{P}_2(\mathbb{R}^{d_x})$
...and 37 more

Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

TL;DR

Abstract

Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (47)