Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

Vishwak Srinivasan; Andre Wibisono; Ashia Wilson

Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

Vishwak Srinivasan, Andre Wibisono, Ashia Wilson

TL;DR

The paper tackles constrained sampling for distributions with density $\pi(x) \propto e^{-f(x)}$ on a compact convex set $\mathcal{K}$ by introducing the Metropolis-adjusted Mirror Langevin algorithm (MAMLA). MAMLA augments a single Mirror Langevin step with a Metropolis–Hastings filter, yielding an unbiased, reversible Markov chain whose stationary distribution is the target $\Pi$, and provides non-asymptotic mixing-time guarantees under standard self-concordance and relative-Convexity/Smoothness assumptions. The authors derive explicit $\delta$-mixing-time bounds in terms of step size $h$, dimension $d$, mirror-barrier parameters, and problem constants, with improved logarithmic dependence on the error tolerance $\delta$ compared to unadjusted discretisations; they also show affine-invariant guarantees in the special Newton-Langevin case. The framework supports practical applications including uniform sampling over polytopes and ellipsoid intersections, and Dirichlet-distribution sampling, with concrete corollaries and a detailed discussion of implementation costs, including Hessian evaluations and Cholesky factorizations. Numerical experiments corroborate the theory, demonstrating favorable mixing-time behavior, high acceptance rates, and clear advantages over unadjusted MLA, thereby offering a scalable, principled approach to constrained Bayesian inference and related tasks.

Abstract

We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of this filter, our method is unbiased relative to the target, while known discretisations of the Mirror Langevin dynamics including the Mirror Langevin algorithm have an asymptotic bias. For this algorithm, we also give upper bounds for the number of iterations taken to mix to a constrained distribution whose potential is relatively smooth, convex, and Lipschitz continuous with respect to a self-concordant mirror function. As a consequence of the reversibility of the Markov chain induced by the inclusion of the Metropolis-Hastings filter, we obtain an exponentially better dependence on the error tolerance for approximate constrained sampling. We also present numerical experiments that corroborate our theoretical findings.

Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

TL;DR

The paper tackles constrained sampling for distributions with density

on a compact convex set

by introducing the Metropolis-adjusted Mirror Langevin algorithm (MAMLA). MAMLA augments a single Mirror Langevin step with a Metropolis–Hastings filter, yielding an unbiased, reversible Markov chain whose stationary distribution is the target

, and provides non-asymptotic mixing-time guarantees under standard self-concordance and relative-Convexity/Smoothness assumptions. The authors derive explicit

-mixing-time bounds in terms of step size

, dimension

, mirror-barrier parameters, and problem constants, with improved logarithmic dependence on the error tolerance

compared to unadjusted discretisations; they also show affine-invariant guarantees in the special Newton-Langevin case. The framework supports practical applications including uniform sampling over polytopes and ellipsoid intersections, and Dirichlet-distribution sampling, with concrete corollaries and a detailed discussion of implementation costs, including Hessian evaluations and Cholesky factorizations. Numerical experiments corroborate the theory, demonstrating favorable mixing-time behavior, high acceptance rates, and clear advantages over unadjusted MLA, thereby offering a scalable, principled approach to constrained Bayesian inference and related tasks.

Abstract

Paper Structure (41 sections, 20 theorems, 144 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 41 sections, 20 theorems, 144 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Related work
Preliminaries
Function classes
Markov chains, conductance and mixing time
Metropolis-adjusted Mirror Langevin algorithm
Mixing time analysis
A discussion of the result in \ref{['thm:mix-mamla']}
Applications of \ref{['alg:mamla']} with provable guarantees
Uniform sampling over polytopes and intersection of ellipsoids
Sampling from Dirichlet distributions
Implementation details
Numerical experiments
Uniform sampling
Mixing time versus dimension
...and 26 more sections

Key Result

Theorem 3.1

Consider a distribution $\Pi$ with density $\pi(x) \propto e^{- f(x) }$ that is supported on a compact and convex set $\mathcal{K} \subset \bbR^{d}$, and mirror map $\phi : \mathcal{K} \to \bbR \cup \{\infty\}$. If $f$ and $\phi$ satisfy assumptions assump:self-concord-assump:rel-lipschitz, then the for universal constants $C^{(1)}, \ldots, C^{(5)}$ such that for any $0 < h \leq h_{\max}$, $\delta

Figures (7)

Figure 1: Progression of \ref{['alg:mamla']} on an $\mathsf{Ellipsoid}(M)$ with $d = 2$ and $\lambda_{1}(M) = 1$ and $\lambda_{2}(M) = 4$. Points coloured in green are contained in $\mathsf{Ellipsoid}^{1/2}(M)$. "Prop" is the proportion of points in the regions. Note that $\widehat{\tau}_{\mathrm{mix}}$ is at most $60$ in this case.
Figure 2: (Empirical) Mixing time versus dimension. The orange line corresponds to $h \propto d^{-1}$, and the blue line corresponds to $h \propto d^{-3/2}$.
Figure 3: Progression of \ref{['alg:mamla']} for sampling from a Dirichlet with $d = 2$, $a_{i} = 6$ for $i \in [3]$.
Figure 4: (Empirical) Mixing time versus dimension. Orange corresponds to $h \propto d^{-3/2}$, and blue corresponds to $h \propto d^{-2}$.
Figure 5: Variation of empirical $2$-Wasserstein distance with iterations for \ref{['alg:mamla']} and \ref{['eq:MLA']}.
...and 2 more figures

Theorems & Definitions (36)

Definition 2.1: nesterov2018lectures
Definition 2.2
Definition 2.3
Definition 2.4: jiang2021mirrorahn2021efficient
Definition 2.5: laddha2020strong
Theorem 3.1
Corollary 3.1
Corollary 3.2
Corollary 3.3
Proposition 3.1
...and 26 more

Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

TL;DR

Abstract

Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (36)