A Non-asymptotic Analysis for Learning and Applying a Preconditioner in MCMC

Max Hird; Florian Maire; Jeffrey Negrea

A Non-asymptotic Analysis for Learning and Applying a Preconditioner in MCMC

Max Hird, Florian Maire, Jeffrey Negrea

TL;DR

This work provides the first non-asymptotic, quantitative analysis of learning and applying a matrix-valued preconditioner in MCMC. By introducing the $\sqrt{N}\varepsilon$-approximately IID in $W_2$ (AIID) framework, it bridges classical mixing-time intuition and modern non-asymptotic guarantees, enabling amortization of preconditioner learning costs across many samples. The authors derive contraction-based iteration complexities for thinned MCMC outputs and give explicit learning complexities for estimating the target covariance $\Sigma_π$ and Fisher information $\mathcal{F}$, with application to Unadjusted Langevin Algorithm (ULA) under covariance and Fisher preconditioning. The total complexity naturally splits into a learning phase and a sampling phase, showing that with sufficiently many final samples, preconditioning can yield net computational gains despite upfront costs. Practical implications include guidance for when to learn a preconditioner (e.g., large $N$ ensembles) and how moderate accuracy in the preconditioner suffices to reap efficiency benefits, with potential extensions to adaptive schemes beyond a single preconditioner.

Abstract

Preconditioning is a common method applied to modify Markov chain Monte Carlo algorithms with the goal of making them more efficient. In practice it is often extremely effective, even when the preconditioner is learned from the chain. We analyse and compare the finite-time computational costs of schemes which learn a preconditioner based on the target covariance or the expected Hessian of the target potential with that of a corresponding scheme that does not use preconditioning. We apply our results to the Unadjusted Langevin Algorithm (ULA) for an appropriately regular target, establishing non-asymptotic guarantees for preconditioned ULA which learns its preconditioner. Our results are also applied to the unadjusted underdamped Langevin algorithm in the supplementary material. To do so, we establish non-asymptotic guarantees on the time taken to collect $N$ approximately independent samples from the target for schemes that learn their preconditioners under the assumption that the underlying Markov chain satisfies a contraction condition in the Wasserstein-2 distance. This approximate independence condition, that we formalize, allows us to bridge the non-asymptotic bounds of modern MCMC theory and classical heuristics of effective sample size and mixing time, and is needed to amortise the costs of learning a preconditioner across the many samples it will be used to produce.

A Non-asymptotic Analysis for Learning and Applying a Preconditioner in MCMC

TL;DR

This work provides the first non-asymptotic, quantitative analysis of learning and applying a matrix-valued preconditioner in MCMC. By introducing the

-approximately IID in

(AIID) framework, it bridges classical mixing-time intuition and modern non-asymptotic guarantees, enabling amortization of preconditioner learning costs across many samples. The authors derive contraction-based iteration complexities for thinned MCMC outputs and give explicit learning complexities for estimating the target covariance

and Fisher information

, with application to Unadjusted Langevin Algorithm (ULA) under covariance and Fisher preconditioning. The total complexity naturally splits into a learning phase and a sampling phase, showing that with sufficiently many final samples, preconditioning can yield net computational gains despite upfront costs. Practical implications include guidance for when to learn a preconditioner (e.g., large

ensembles) and how moderate accuracy in the preconditioner suffices to reap efficiency benefits, with potential extensions to adaptive schemes beyond a single preconditioner.

Abstract

approximately independent samples from the target for schemes that learn their preconditioners under the assumption that the underlying Markov chain satisfies a contraction condition in the Wasserstein-2 distance. This approximate independence condition, that we formalize, allows us to bridge the non-asymptotic bounds of modern MCMC theory and classical heuristics of effective sample size and mixing time, and is needed to amortise the costs of learning a preconditioner across the many samples it will be used to produce.

Paper Structure (46 sections, 17 theorems, 146 equations, 2 algorithms)

This paper contains 46 sections, 17 theorems, 146 equations, 2 algorithms.

Introduction
Related Work
Notation
Sampling Algorithms
Main Results
Technical Overview
Proof Sketch of Theorem \ref{['thm:Wasserstein_contraction_complexity']}
Proof Sketch of Theorem \ref{['thm:preconditioner_learn_complexity']}
Proof Sketch of Theorem \ref{['thm:preconditioner_learn_complexity']} Part 1.
Proof Sketch of Theorem \ref{['thm:preconditioner_learn_complexity']} Part 2.
Proof Sketch of Theorem \ref{['thm:total_ULA_complexities']}
Proof Sketch of Theorem \ref{['thm:total_ULA_complexities']} Part 1.
Proof Sketch of Theorem \ref{['thm:total_ULA_complexities']} Part 2.
Time to construct the preconditioner
Time to achieve the $\sqrt{N}\varepsilon$-AIID output $\{X_t\}_{t=1}^N$
...and 31 more sections

Key Result

Theorem 3

Let the underlying MCMC kernel for the thinned Markov chain sampler in Algorithm alg:thinned_Markov_chain satisfy the Wasserstein contraction condition in Definition def:W_2_contraction with $\Gamma \geq 1$. Let $\varepsilon>0$ such that $\varepsilon \in ( 3\Gamma^2b, \sqrt{3\mathrm{tr}(\Sigma_\pi)}

Theorems & Definitions (21)

Definition 1
Definition 2
Theorem 3
Theorem 4
Theorem 5
Proposition 6
Proposition 7
Proposition 8
Proposition 9
Proposition 10
...and 11 more

A Non-asymptotic Analysis for Learning and Applying a Preconditioner in MCMC

TL;DR

Abstract

A Non-asymptotic Analysis for Learning and Applying a Preconditioner in MCMC

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (21)