Table of Contents
Fetching ...

Large deviation-based tuning schemes for Metropolis-Hastings algorithms

Federica Milinanni

TL;DR

The paper tackles tuning Metropolis-Hastings algorithms through a large deviation framework for the empirical measure. It develops an alternative dual representation of the MH rate function and derives practical upper and lower bounds to quantify convergence speed without exact rate-function computation. Using these bounds, it proposes three large-deviation-based tuning schemes to identify near-optimal MH hyperparameters, demonstrated on an Independent MH example where tuning aligns the proposal with the target. The work provides a principled method for pre-calibrating MH-type algorithms and paves the way for applying these ideas to more advanced MH variants like MALA and HMC.

Abstract

Markov chain Monte Carlo (MCMC) methods are one of the most popular classes of algorithms for sampling from a target probability distribution. A rising trend in recent years consists in analyzing the convergence of MCMC algorithms using tools from the theory of large deviations. In (Milinanni & Nyquist, 2024), a new framework based on this approach has been developed to study the convergence of empirical measures associated with algorithms of Metropolis-Hastings type, a broad and popular sub-class of MCMC methods. The goal of this paper is to leverage these large deviation results to improve the efficiency of Metropolis-Hastings algorithms. Specifically, we use the large deviations rate function (a central object in large deviation theory) to quantify and characterize the algorithms' speed of convergence. We begin by extending the analysis from (Milinanni & Nyquist, 2024), deriving alternative representations of the rate function. Building on this, we establish explicit upper and lower bounds, which we then use to design schemes to tune Metropolis-Hastings algorithms.

Large deviation-based tuning schemes for Metropolis-Hastings algorithms

TL;DR

The paper tackles tuning Metropolis-Hastings algorithms through a large deviation framework for the empirical measure. It develops an alternative dual representation of the MH rate function and derives practical upper and lower bounds to quantify convergence speed without exact rate-function computation. Using these bounds, it proposes three large-deviation-based tuning schemes to identify near-optimal MH hyperparameters, demonstrated on an Independent MH example where tuning aligns the proposal with the target. The work provides a principled method for pre-calibrating MH-type algorithms and paves the way for applying these ideas to more advanced MH variants like MALA and HMC.

Abstract

Markov chain Monte Carlo (MCMC) methods are one of the most popular classes of algorithms for sampling from a target probability distribution. A rising trend in recent years consists in analyzing the convergence of MCMC algorithms using tools from the theory of large deviations. In (Milinanni & Nyquist, 2024), a new framework based on this approach has been developed to study the convergence of empirical measures associated with algorithms of Metropolis-Hastings type, a broad and popular sub-class of MCMC methods. The goal of this paper is to leverage these large deviation results to improve the efficiency of Metropolis-Hastings algorithms. Specifically, we use the large deviations rate function (a central object in large deviation theory) to quantify and characterize the algorithms' speed of convergence. We begin by extending the analysis from (Milinanni & Nyquist, 2024), deriving alternative representations of the rate function. Building on this, we establish explicit upper and lower bounds, which we then use to design schemes to tune Metropolis-Hastings algorithms.
Paper Structure (12 sections, 12 theorems, 82 equations, 19 figures, 4 algorithms)

This paper contains 12 sections, 12 theorems, 82 equations, 19 figures, 4 algorithms.

Key Result

Theorem 1

Let $\{X_i\}_{i\ge0}$ be the Metropolis–Hastings chain and $K(x,dy)$ the associated transition kernel. Let $\{L^n\}_{n\ge1}\subset\mathcal{P}(S)$ be the corresponding sequence of empirical measures. Under Assumptions (A.1)–(A.3) (see milinanni2024a), $\{L^n\}_{n\ge1}$ satisfies a large deviation pri

Figures (19)

  • Figure 1: Illustration of the space $\mathcal{P}(S)$ of probability measures on a Polish space $S$, the target measure $\pi\in\mathcal{P}(S)$, the complement of the ball of radius $\varepsilon>0$ centered at $\pi$, $B_\varepsilon(\pi)^\complement$, and two random sequences of empirical measures $\{L^n\}$ corresponding to the behaviors with high and low probability (green and red dots, respectively). With high probability, for large $n$, the random empirical measure $L^n$ will be close to $\pi$, thus, $\mathbb{P}(L^n\in B_\varepsilon(\pi))\approx 1$ (green dots). Instead, $L^n\in B_\varepsilon(\pi)^\complement$ is a rare event (red dots) as $n$ grows, and the corresponding probability $\mathbb{P}(L^n\in B_\varepsilon(\pi)^\complement)$ decays to $0$ exponentially fast, as $n\to\infty$.
  • Figure 2: $\mu_1\sim\mathcal{N}(1,2^2)$
  • Figure 3: $\mu_2\sim \text{Weib}(3,2)$
  • Figure 4: $\mu_3\sim\text{Unif}([0,1])$
  • Figure 5: $\mu_4\space\sim\space\frac{1}{2}(\mathcal{N}(5,2^2)+\mathcal{N}(-3,1))$
  • ...and 14 more figures

Theorems & Definitions (25)

  • Theorem 1: Theorem 4.1 in milinanni2024a
  • Remark 1
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Theorem 5
  • Corollary 1: Rate function upper bound
  • ...and 15 more