Table of Contents
Fetching ...

Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

Vishwak Srinivasan, Andre Wibisono, Ashia Wilson

TL;DR

The paper tackles constrained sampling for distributions with density $\pi(x) \propto e^{-f(x)}$ on a compact convex set $\mathcal{K}$ by introducing the Metropolis-adjusted Mirror Langevin algorithm (MAMLA). MAMLA augments a single Mirror Langevin step with a Metropolis–Hastings filter, yielding an unbiased, reversible Markov chain whose stationary distribution is the target $\Pi$, and provides non-asymptotic mixing-time guarantees under standard self-concordance and relative-Convexity/Smoothness assumptions. The authors derive explicit $\delta$-mixing-time bounds in terms of step size $h$, dimension $d$, mirror-barrier parameters, and problem constants, with improved logarithmic dependence on the error tolerance $\delta$ compared to unadjusted discretisations; they also show affine-invariant guarantees in the special Newton-Langevin case. The framework supports practical applications including uniform sampling over polytopes and ellipsoid intersections, and Dirichlet-distribution sampling, with concrete corollaries and a detailed discussion of implementation costs, including Hessian evaluations and Cholesky factorizations. Numerical experiments corroborate the theory, demonstrating favorable mixing-time behavior, high acceptance rates, and clear advantages over unadjusted MLA, thereby offering a scalable, principled approach to constrained Bayesian inference and related tasks.

Abstract

We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of this filter, our method is unbiased relative to the target, while known discretisations of the Mirror Langevin dynamics including the Mirror Langevin algorithm have an asymptotic bias. For this algorithm, we also give upper bounds for the number of iterations taken to mix to a constrained distribution whose potential is relatively smooth, convex, and Lipschitz continuous with respect to a self-concordant mirror function. As a consequence of the reversibility of the Markov chain induced by the inclusion of the Metropolis-Hastings filter, we obtain an exponentially better dependence on the error tolerance for approximate constrained sampling. We also present numerical experiments that corroborate our theoretical findings.

Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

TL;DR

The paper tackles constrained sampling for distributions with density on a compact convex set by introducing the Metropolis-adjusted Mirror Langevin algorithm (MAMLA). MAMLA augments a single Mirror Langevin step with a Metropolis–Hastings filter, yielding an unbiased, reversible Markov chain whose stationary distribution is the target , and provides non-asymptotic mixing-time guarantees under standard self-concordance and relative-Convexity/Smoothness assumptions. The authors derive explicit -mixing-time bounds in terms of step size , dimension , mirror-barrier parameters, and problem constants, with improved logarithmic dependence on the error tolerance compared to unadjusted discretisations; they also show affine-invariant guarantees in the special Newton-Langevin case. The framework supports practical applications including uniform sampling over polytopes and ellipsoid intersections, and Dirichlet-distribution sampling, with concrete corollaries and a detailed discussion of implementation costs, including Hessian evaluations and Cholesky factorizations. Numerical experiments corroborate the theory, demonstrating favorable mixing-time behavior, high acceptance rates, and clear advantages over unadjusted MLA, thereby offering a scalable, principled approach to constrained Bayesian inference and related tasks.

Abstract

We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of this filter, our method is unbiased relative to the target, while known discretisations of the Mirror Langevin dynamics including the Mirror Langevin algorithm have an asymptotic bias. For this algorithm, we also give upper bounds for the number of iterations taken to mix to a constrained distribution whose potential is relatively smooth, convex, and Lipschitz continuous with respect to a self-concordant mirror function. As a consequence of the reversibility of the Markov chain induced by the inclusion of the Metropolis-Hastings filter, we obtain an exponentially better dependence on the error tolerance for approximate constrained sampling. We also present numerical experiments that corroborate our theoretical findings.
Paper Structure (41 sections, 20 theorems, 144 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 41 sections, 20 theorems, 144 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Theorem 3.1

Consider a distribution $\Pi$ with density $\pi(x) \propto e^{- f(x) }$ that is supported on a compact and convex set $\mathcal{K} \subset \bbR^{d}$, and mirror map $\phi : \mathcal{K} \to \bbR \cup \{\infty\}$. If $f$ and $\phi$ satisfy assumptions assump:self-concord-assump:rel-lipschitz, then the for universal constants $C^{(1)}, \ldots, C^{(5)}$ such that for any $0 < h \leq h_{\max}$, $\delta

Figures (7)

  • Figure 1: Progression of \ref{['alg:mamla']} on an $\mathsf{Ellipsoid}(M)$ with $d = 2$ and $\lambda_{1}(M) = 1$ and $\lambda_{2}(M) = 4$. Points coloured in green are contained in $\mathsf{Ellipsoid}^{1/2}(M)$. "Prop" is the proportion of points in the regions. Note that $\widehat{\tau}_{\mathrm{mix}}$ is at most $60$ in this case.
  • Figure 2: (Empirical) Mixing time versus dimension. The orange line corresponds to $h \propto d^{-1}$, and the blue line corresponds to $h \propto d^{-3/2}$.
  • Figure 3: Progression of \ref{['alg:mamla']} for sampling from a Dirichlet with $d = 2$, $a_{i} = 6$ for $i \in [3]$.
  • Figure 4: (Empirical) Mixing time versus dimension. Orange corresponds to $h \propto d^{-3/2}$, and blue corresponds to $h \propto d^{-2}$.
  • Figure 5: Variation of empirical $2$-Wasserstein distance with iterations for \ref{['alg:mamla']} and \ref{['eq:MLA']}.
  • ...and 2 more figures

Theorems & Definitions (36)

  • Definition 2.1: nesterov2018lectures
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4: jiang2021mirrorahn2021efficient
  • Definition 2.5: laddha2020strong
  • Theorem 3.1
  • Corollary 3.1
  • Corollary 3.2
  • Corollary 3.3
  • Proposition 3.1
  • ...and 26 more