Large deviation-based tuning schemes for Metropolis-Hastings algorithms

Federica Milinanni

Large deviation-based tuning schemes for Metropolis-Hastings algorithms

Federica Milinanni

TL;DR

The paper tackles tuning Metropolis-Hastings algorithms through a large deviation framework for the empirical measure. It develops an alternative dual representation of the MH rate function and derives practical upper and lower bounds to quantify convergence speed without exact rate-function computation. Using these bounds, it proposes three large-deviation-based tuning schemes to identify near-optimal MH hyperparameters, demonstrated on an Independent MH example where tuning aligns the proposal with the target. The work provides a principled method for pre-calibrating MH-type algorithms and paves the way for applying these ideas to more advanced MH variants like MALA and HMC.

Abstract

Markov chain Monte Carlo (MCMC) methods are one of the most popular classes of algorithms for sampling from a target probability distribution. A rising trend in recent years consists in analyzing the convergence of MCMC algorithms using tools from the theory of large deviations. In (Milinanni & Nyquist, 2024), a new framework based on this approach has been developed to study the convergence of empirical measures associated with algorithms of Metropolis-Hastings type, a broad and popular sub-class of MCMC methods. The goal of this paper is to leverage these large deviation results to improve the efficiency of Metropolis-Hastings algorithms. Specifically, we use the large deviations rate function (a central object in large deviation theory) to quantify and characterize the algorithms' speed of convergence. We begin by extending the analysis from (Milinanni & Nyquist, 2024), deriving alternative representations of the rate function. Building on this, we establish explicit upper and lower bounds, which we then use to design schemes to tune Metropolis-Hastings algorithms.

Large deviation-based tuning schemes for Metropolis-Hastings algorithms

TL;DR

Abstract

Paper Structure (12 sections, 12 theorems, 82 equations, 19 figures, 4 algorithms)

This paper contains 12 sections, 12 theorems, 82 equations, 19 figures, 4 algorithms.

Introduction
Preliminaries
Notation
The Metropolis-Hastings algorithm
Large deviation principle for Metropolis-Hastings Markov chains
Alternative representation of the rate function
Rate function upper and lower bounds
Rate function upper bounds from the relative entropy representation
Rate function lower bound from the Donsker-Varadhan representation
Rate function lower bound by the variational formula for the relative entropy
Tuning Metropolis-Hastings algorithms via the rate function lower bounds
An illustrative example: Tuning the Independent Metropolis-Hastings algorithm

Key Result

Theorem 1

Let $\{X_i\}_{i\ge0}$ be the Metropolis–Hastings chain and $K(x,dy)$ the associated transition kernel. Let $\{L^n\}_{n\ge1}\subset\mathcal{P}(S)$ be the corresponding sequence of empirical measures. Under Assumptions (A.1)–(A.3) (see milinanni2024a), $\{L^n\}_{n\ge1}$ satisfies a large deviation pri

Figures (19)

Figure 1: Illustration of the space $\mathcal{P}(S)$ of probability measures on a Polish space $S$, the target measure $\pi\in\mathcal{P}(S)$, the complement of the ball of radius $\varepsilon>0$ centered at $\pi$, $B_\varepsilon(\pi)^\complement$, and two random sequences of empirical measures $\{L^n\}$ corresponding to the behaviors with high and low probability (green and red dots, respectively). With high probability, for large $n$, the random empirical measure $L^n$ will be close to $\pi$, thus, $\mathbb{P}(L^n\in B_\varepsilon(\pi))\approx 1$ (green dots). Instead, $L^n\in B_\varepsilon(\pi)^\complement$ is a rare event (red dots) as $n$ grows, and the corresponding probability $\mathbb{P}(L^n\in B_\varepsilon(\pi)^\complement)$ decays to $0$ exponentially fast, as $n\to\infty$.
Figure 2: $\mu_1\sim\mathcal{N}(1,2^2)$
Figure 3: $\mu_2\sim \text{Weib}(3,2)$
Figure 4: $\mu_3\sim\text{Unif}([0,1])$
Figure 5: $\mu_4\space\sim\space\frac{1}{2}(\mathcal{N}(5,2^2)+\mathcal{N}(-3,1))$
...and 14 more figures

Theorems & Definitions (25)

Theorem 1: Theorem 4.1 in milinanni2024a
Remark 1
Proposition 2
proof
Proposition 3
proof
Proposition 4
proof
Theorem 5
Corollary 1: Rate function upper bound
...and 15 more

Large deviation-based tuning schemes for Metropolis-Hastings algorithms

TL;DR

Abstract

Large deviation-based tuning schemes for Metropolis-Hastings algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (25)