Stochastic Approximation with Biased MCMC for Expectation Maximization

Samuel Gruffaz; Kyurae Kim; Alain Oliviero Durmus; Jacob R. Gardner

Stochastic Approximation with Biased MCMC for Expectation Maximization

Samuel Gruffaz, Kyurae Kim, Alain Oliviero Durmus, Jacob R. Gardner

TL;DR

This paper addresses the challenge of performing empirical Bayes inference with EM when the E-step is intractable by using SAEM with MCMC, and it specifically analyzes the impact of using asymptotically biased MCMC (ULA) vs asymptotically unbiased MALA. It provides both asymptotic and non-asymptotic analyses of MCMC-SAEM under bias, establishing bias-propagation bounds and high-probability convergence guarantees that depend on the smoothness of the objective via a Lyapunov function $V$. The authors show that bias can be controlled and that ULA often offers greater stability and faster practical convergence than MALA in high-dimensional or poorly conditioned problems, supported by extensive experiments on synthetic logistic regression, pharmacokinetics, robust Poisson regression, and ARD-enabled logistic regression. Overall, the work validates the use of approximate MCMC within SAEM, offering theoretical guarantees and practical guidance for bias-aware algorithm design in latent-variable models.

Abstract

The expectation maximization (EM) algorithm is a widespread method for empirical Bayesian inference, but its expectation step (E-step) is often intractable. Employing a stochastic approximation scheme with Markov chain Monte Carlo (MCMC) can circumvent this issue, resulting in an algorithm known as MCMC-SAEM. While theoretical guarantees for MCMC-SAEM have previously been established, these results are restricted to the case where asymptotically unbiased MCMC algorithms are used. In practice, MCMC-SAEM is often run with asymptotically biased MCMC, for which the consequences are theoretically less understood. In this work, we fill this gap by analyzing the asymptotics and non-asymptotics of SAEM with biased MCMC steps, particularly the effect of bias. We also provide numerical experiments comparing the Metropolis-adjusted Langevin algorithm (MALA), which is asymptotically unbiased, and the unadjusted Langevin algorithm (ULA), which is asymptotically biased, on synthetic and real datasets. Experimental results show that ULA is more stable with respect to the choice of Langevin stepsize and can sometimes result in faster convergence.

Stochastic Approximation with Biased MCMC for Expectation Maximization

TL;DR

. The authors show that bias can be controlled and that ULA often offers greater stability and faster practical convergence than MALA in high-dimensional or poorly conditioned problems, supported by extensive experiments on synthetic logistic regression, pharmacokinetics, robust Poisson regression, and ARD-enabled logistic regression. Overall, the work validates the use of approximate MCMC within SAEM, offering theoretical guarantees and practical guidance for bias-aware algorithm design in latent-variable models.

Abstract

Paper Structure (46 sections, 10 theorems, 88 equations, 4 figures, 3 tables)

This paper contains 46 sections, 10 theorems, 88 equations, 4 figures, 3 tables.

INTRODUCTION
Contributions
BACKGROUND
Expectation-Maximization
EM as a Root Finding Problem
EM as Stochastic Approximation
MCMC-SAEM
MCMC-SAEM with Approximate MCMC Algorithms
ULA and MALA
Stochastic Approximation with Biased Dynamics
ASYMPTOTIC ANALYSIS
Technical Assumptions
Main Result
NON ASYMPTOTIC ANALYSIS
Technical Assumptions
...and 31 more sections

Key Result

Lemma 1

Under assumption:exponential_family, assumption:Maximization and hyp:regularity_loss, $V$ is $p$-continuously differentiable and verifies for any $s\in \mathbb{R}^d$,

Figures (4)

Figure 1: Trajectory of the MCMC-SAEM iterates for $\theta_2$ with a large MALA/ULA stepsize of $\eta = 5 \times 10^{-3}$. MALA only makes "occasional" progress due to rejections, while ULA makes progress nonetheless, albeit with some asymptotic bias. The dotted line marks the true value $\theta_2^*$.
Figure 2: Test average marginal log-predictive density (LPD) for the pharmacokinetics model versus the MALA/ULA stepsize $\eta$. The colored bands are 80% bootstrap confidence intervals of the mean computed from $32$ independent train-test splits of a ratio of $9:3$.
Figure 3: Test average log-predictive density (LPD) for robust Poisson regression versus the MALA/ULA stepsize $\eta$. ULA is more robust against the choice of stepsize on azpro. The colored bands are 80% bootstrap confidence intervals of the mean computed from $32$ independent train-test splits of a ratio of $8:1$.
Figure 4: Test average log-predictive density (LPD) for logistic regression with automatic relevance determination versus MALA/ULA stepsize $\eta$. The colored bands are 80% bootstrap confidence intervals of the mean computed from $32$ independent train-test splits of a ratio of $8:1$.

Theorems & Definitions (14)

Lemma 1
proof
Theorem 1
proof
Theorem 2
proof
Lemma 2
Theorem 3
Lemma 3
proof
...and 4 more

Stochastic Approximation with Biased MCMC for Expectation Maximization

TL;DR

Abstract

Stochastic Approximation with Biased MCMC for Expectation Maximization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (14)