Table of Contents
Fetching ...

Some aspects of robustness in modern Markov Chain Monte Carlo

Sam Power, Giorgos Vasdekis

TL;DR

The paper surveys robustness concerns in modern MCMC when target distributions display rough local geometry or heavy tails, identifying two main pathologies and evaluating a spectrum of remedies. It connects standard diffusion-based MCMC (e.g., overdamped Langevin, MALA) and PDMP-based methods to practical algorithms designed for stability, including Truncated, Tamed, Proximal, Barker, and non-quadratic kinetic-energy variants, as well as PDMC. For heavy tails, it analyzes space- and time-transformations as means to obtain lighter-tailed targets or faster tail exploration, with concrete examples like Cauchy and Laplace-type targets. The work emphasizes open problems in local adaptation, discretisation strategies, and principled model problems, and argues for robust methods that maintain performance in well-behaved settings while gracefully degrading under pathologies, with significant practical implications for Bayesian computation and high-dimensional inference.

Abstract

Markov Chain Monte Carlo (MCMC) is a flexible approach to approximate sampling from intractable probability distributions, with a rich theoretical foundation and comprising a wealth of exemplar algorithms. While the qualitative correctness of MCMC algorithms is often easy to ensure, their practical efficiency is contingent on the `target' distribution being reasonably well-behaved. In this work, we concern ourself with the scenario in which this good behaviour is called into question, reviewing an emerging line of work on `robust' MCMC algorithms which can perform acceptably even in the face of certain pathologies. We focus on two particular pathologies which, while simple, can already have dramatic effects on standard `local' algorithms. The first is roughness, whereby the target distribution varies so rapidly that the numerical stability of the algorithm is tenuous. The second is flatness, whereby the landscape of the target distribution is instead so barren and uninformative that one becomes lost in uninteresting parts of the state space. In each case, we formulate the pathology in concrete terms, review a range of proposed algorithmic remedies to the pathology, and outline promising directions for future research.

Some aspects of robustness in modern Markov Chain Monte Carlo

TL;DR

The paper surveys robustness concerns in modern MCMC when target distributions display rough local geometry or heavy tails, identifying two main pathologies and evaluating a spectrum of remedies. It connects standard diffusion-based MCMC (e.g., overdamped Langevin, MALA) and PDMP-based methods to practical algorithms designed for stability, including Truncated, Tamed, Proximal, Barker, and non-quadratic kinetic-energy variants, as well as PDMC. For heavy tails, it analyzes space- and time-transformations as means to obtain lighter-tailed targets or faster tail exploration, with concrete examples like Cauchy and Laplace-type targets. The work emphasizes open problems in local adaptation, discretisation strategies, and principled model problems, and argues for robust methods that maintain performance in well-behaved settings while gracefully degrading under pathologies, with significant practical implications for Bayesian computation and high-dimensional inference.

Abstract

Markov Chain Monte Carlo (MCMC) is a flexible approach to approximate sampling from intractable probability distributions, with a rich theoretical foundation and comprising a wealth of exemplar algorithms. While the qualitative correctness of MCMC algorithms is often easy to ensure, their practical efficiency is contingent on the `target' distribution being reasonably well-behaved. In this work, we concern ourself with the scenario in which this good behaviour is called into question, reviewing an emerging line of work on `robust' MCMC algorithms which can perform acceptably even in the face of certain pathologies. We focus on two particular pathologies which, while simple, can already have dramatic effects on standard `local' algorithms. The first is roughness, whereby the target distribution varies so rapidly that the numerical stability of the algorithm is tenuous. The second is flatness, whereby the landscape of the target distribution is instead so barren and uninformative that one becomes lost in uninteresting parts of the state space. In each case, we formulate the pathology in concrete terms, review a range of proposed algorithmic remedies to the pathology, and outline promising directions for future research.

Paper Structure

This paper contains 26 sections, 66 equations, 29 figures.

Figures (29)

  • Figure 1: MALA algorithm on a two-dimensional correlated Gaussian target (\ref{['eq:2d.Gauss']}). Step-size $h=0.35$. Number of iterations $N=10^4$. Starting point $x_0=(4,5)$.
  • Figure 2: Randomized Hamiltonian Monte Carlo (RHMC) algorithm on a two-dimensional correlated Gaussian target (\ref{['eq:2d.Gauss']}). Correlation parameter $\rho=0$. Rate $\lambda=0.2$. Number of iterations $N=10^4$. Starting point $x_0=(4,5)$.
  • Figure 3: Bouncy Particle Sampler (BPS) on a bivariate correlated Gaussian target (\ref{['eq:2d.Gauss']}). Number of direction switches: $N = 10^4$. Starting point: $x_0 = (4, 5)$. Refresh rate $\lambda = 0.66$.
  • Figure 4: Zig-Zag Sampler (ZZS) on a two-dimensional correlated Gaussian target (\ref{['eq:2d.Gauss']}). Number of direction switches: $N = 10^4$. Starting point: $x_0 = (4, 5)$.
  • Figure 5: Plots of the negative log-densities of various densities. The left plot shows the growth of Laplace potential and of other densities with lighter tails. The right plot shows the growth for heavier tails.
  • ...and 24 more figures

Theorems & Definitions (9)

  • Example 3.1: 'Polynomially-Steep' Potential
  • Example 3.2: 'Locally-Sharp' Potential
  • Example 3.3: Divergent Potential with Natural Boundary
  • Example 3.4: Nice Potential, Artificial Boundary
  • Example 3.5: Connections with ULA
  • Definition 4.1
  • Example 4.1: Cauchy distribution
  • Example 4.2: Cauchy Regression with Horseshoe Prior
  • Example 4.3: Cauchy target