Table of Contents
Fetching ...

High-accuracy sampling from constrained spaces with the Metropolis-adjusted Preconditioned Langevin Algorithm

Vishwak Srinivasan, Andre Wibisono, Ashia Wilson

TL;DR

This work introduces MAPLA, a Metropolis-adjusted, preconditioned Langevin sampler for constrained distributions with convex support. By leveraging a geometry-aware metric $\mathscr{G}$ and a one-step PLA proposal inside a Metropolis filter, MAPLA is reversible and unbiased for targets $\Pi(x) \propto e^{-f(x)}$ on $\mathcal{K}$. The authors establish non-asymptotic mixing-time guarantees under self-concordant and stronger self-concordant++ conditions, with clearer dimension dependence when stronger curvature and symmetry conditions hold, and they also derive results for linear and exponential densities. Numerical experiments on Dirichlet sampling and Bayesian logistic regression demonstrate practical advantages of incorporating gradient information via the natural gradient relative to geometry-based walks like DikinWalk. Overall, the paper provides a rigorous, geometry-driven framework for fast, high-accuracy constrained sampling with provable mixing-time guarantees and practical validation.

Abstract

In this work, we propose a first-order sampling method called the Metropolis-adjusted Preconditioned Langevin Algorithm for approximate sampling from a target distribution whose support is a proper convex subset of $\mathbb{R}^{d}$. Our proposed method is the result of applying a Metropolis-Hastings filter to the Markov chain formed by a single step of the preconditioned Langevin algorithm with a metric $\mathscr{G}$, and is motivated by the natural gradient descent algorithm for optimisation. We derive non-asymptotic upper bounds for the mixing time of this method for sampling from target distributions whose potentials are bounded relative to $\mathscr{G}$, and for exponential distributions restricted to the support. Our analysis suggests that if $\mathscr{G}$ satisfies stronger notions of self-concordance introduced in Kook and Vempala (2024), then these mixing time upper bounds have a strictly better dependence on the dimension than when is merely self-concordant. We also provide numerical experiments that demonstrates the practicality of our proposed method. Our method is a high-accuracy sampler due to the polylogarithmic dependence on the error tolerance in our mixing time upper bounds.

High-accuracy sampling from constrained spaces with the Metropolis-adjusted Preconditioned Langevin Algorithm

TL;DR

This work introduces MAPLA, a Metropolis-adjusted, preconditioned Langevin sampler for constrained distributions with convex support. By leveraging a geometry-aware metric and a one-step PLA proposal inside a Metropolis filter, MAPLA is reversible and unbiased for targets on . The authors establish non-asymptotic mixing-time guarantees under self-concordant and stronger self-concordant++ conditions, with clearer dimension dependence when stronger curvature and symmetry conditions hold, and they also derive results for linear and exponential densities. Numerical experiments on Dirichlet sampling and Bayesian logistic regression demonstrate practical advantages of incorporating gradient information via the natural gradient relative to geometry-based walks like DikinWalk. Overall, the paper provides a rigorous, geometry-driven framework for fast, high-accuracy constrained sampling with provable mixing-time guarantees and practical validation.

Abstract

In this work, we propose a first-order sampling method called the Metropolis-adjusted Preconditioned Langevin Algorithm for approximate sampling from a target distribution whose support is a proper convex subset of . Our proposed method is the result of applying a Metropolis-Hastings filter to the Markov chain formed by a single step of the preconditioned Langevin algorithm with a metric , and is motivated by the natural gradient descent algorithm for optimisation. We derive non-asymptotic upper bounds for the mixing time of this method for sampling from target distributions whose potentials are bounded relative to , and for exponential distributions restricted to the support. Our analysis suggests that if satisfies stronger notions of self-concordance introduced in Kook and Vempala (2024), then these mixing time upper bounds have a strictly better dependence on the dimension than when is merely self-concordant. We also provide numerical experiments that demonstrates the practicality of our proposed method. Our method is a high-accuracy sampler due to the polylogarithmic dependence on the error tolerance in our mixing time upper bounds.

Paper Structure

This paper contains 49 sections, 24 theorems, 213 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Theorem 4.1

Consider a distribution $\Pi$ supported over $\mathcal{K}$ that is a closed, convex subset of $\bbR^{d}$ whose density is $\pi(x) \propto e^{- f(x) }$. Let the metric $\mathscr{G} : \mathrm{int}(\mathcal{K}) \to \bbS_{+}^{d}$ be self-concordant and $\nu$-symmetric, and assume that the potential $f : For precision $\delta \in (0, 1/2)$ and warmness parameter $M \geq 1$, if the step size $h$ is boun

Figures (5)

  • Figure 1: Variation of empirical mixing time, computed with $\widetilde{W_{2}^{2}}$ (left) and $\mathrm{ED}$ (right) for both MAPLA and DikinWalk. The dashed and dotted lines correspond to $C_{h} = 0.1$ and $0.2$ respectively. The ordinates of the markers indicate the average empirical mixing time over 20 simulations.
  • Figure 2: Variation of $\textsf{dist}(\widehat{\bbT}^{k}_{\textsf{alg}}, \widehat{\Pi})$ for $\textsf{dist} = \widetilde{W_{2}^{2}}$ (left) and $\mathrm{ED}$ (right) with iteration $k$. For the plots showing the variation of $\widetilde{W_{2}^{2}}(\widehat{\bbT}^{k}_{}, \widehat{\Pi})$, the $x$-axis is truncated to $1000$ as the values converged.
  • Figure 3: Variation of $\widehat{R}_{\mathrm{accept}}$ with dimension $d$ when stepsize $h \propto d^{-\gamma}$.
  • Figure 4: Average variation of $\textsf{meas}_{k}$ for $\textsf{meas} = \widehat{\mathrm{Err}}$ (left) and $\widehat{\mathrm{NLL}}$ (right) with iteration $k$. The faint lines shown depicts the variation over iterations per simulation.
  • Figure 5: Variation in the IQR of $\widehat{\mathrm{Diff}}_{k}$ with iteration $k$. The upper and lower bars indicate the 75th and 25th percentile of $\widehat{\mathrm{Diff}}_{k}$ respectively.

Theorems & Definitions (53)

  • Definition 4.1
  • Definition 4.2
  • Definition 4.3
  • Definition 4.4
  • Definition 4.5
  • Definition 4.6
  • Definition 4.7
  • Definition 4.8
  • Theorem 4.1
  • Theorem 4.2
  • ...and 43 more