Table of Contents
Fetching ...

Central Limit Theorem for ergodic averages of Markov chains \& the comparison of sampling algorithms for heavy-tailed distributions

Miha Brešar, Aleksandar Mijatović, Gareth Roberts

TL;DR

This work builds a discrete-time L-drift framework (V,φ,Ψ) to derive verifiable necessary conditions for the ergodic CLT of Markov chains on general state spaces, and to obtain sharp lower bounds on convergence rates and invariant-tail behavior. It then applies this theory to a wide class of heavy-tailed sampling algorithms (RWM, iv-RWM, MALA, ULA, SPS, IS), distinguishing single-jump and many-jump tail exploration regimes and showing how algorithm design and tail assumptions critically affect CLT validity and convergence speed. The results provide practical criteria for algorithm selection and underscore the limitations and biases of biased/unadjusted schemes when targeting heavy-tailed distributions, while identifying scenarios where certain methods (e.g., SPS or iv-RWM with infinite-variance proposals) yield substantial gains. Overall, the paper significantly advances the theoretical understanding of CLTs in MCMC with heavy tails and offers actionable guidance for designing and comparing sampling algorithms in high-tail settings.

Abstract

Establishing central limit theorems (CLTs) for ergodic averages of Markov chains is a fundamental problem in probability and its applications. Since the seminal work~\cite{MR834478}, a vast literature has emerged on the sufficient conditions for such CLTs. To counterbalance this, the present paper provides verifiable necessary conditions for CLTs of ergodic averages of Markov chains on general state spaces. Our theory is based on drift conditions, which also yield lower bounds on the rates of convergence to stationarity in various metrics. The validity of the ergodic CLT is of particular importance for sampling algorithms, where it underpins the error analysis of estimators in Bayesian statistics and machine learning. Although heavy-tailed sampling is of central importance in applications, the characterisation of the CLT and the convergence rates are theoretically poorly understood for almost all practically-used Markov chain Monte Carlo (MCMC) algorithms. In this setting our results provide sharp conditions on the validity of the ergodic CLT and establish convergence rates for large families of MCMC sampling algorithms for heavy-tailed targets. Our study includes a rather complete analyses for random walk Metropolis samplers (with finite- and infinite-variance proposals), Metropolis-adjusted and unadjusted Langevin algorithms and the stereographic projection sampler (as well as the independence sampler). By providing these sharp results via our practical drift conditions, our theory offers significant insights into the problems of algorithm selection and comparison for sampling heavy-tailed distributions (see short YouTube presentations~\cite{YouTube_talk} describing our \href{https://youtu.be/m2y7U4cEqy4}{\underline{theory}} and \href{https://youtu.be/w8I_oOweuko}{\underline{applications}}).

Central Limit Theorem for ergodic averages of Markov chains \& the comparison of sampling algorithms for heavy-tailed distributions

TL;DR

This work builds a discrete-time L-drift framework (V,φ,Ψ) to derive verifiable necessary conditions for the ergodic CLT of Markov chains on general state spaces, and to obtain sharp lower bounds on convergence rates and invariant-tail behavior. It then applies this theory to a wide class of heavy-tailed sampling algorithms (RWM, iv-RWM, MALA, ULA, SPS, IS), distinguishing single-jump and many-jump tail exploration regimes and showing how algorithm design and tail assumptions critically affect CLT validity and convergence speed. The results provide practical criteria for algorithm selection and underscore the limitations and biases of biased/unadjusted schemes when targeting heavy-tailed distributions, while identifying scenarios where certain methods (e.g., SPS or iv-RWM with infinite-variance proposals) yield substantial gains. Overall, the paper significantly advances the theoretical understanding of CLTs in MCMC with heavy tails and offers actionable guidance for designing and comparing sampling algorithms in high-tail settings.

Abstract

Establishing central limit theorems (CLTs) for ergodic averages of Markov chains is a fundamental problem in probability and its applications. Since the seminal work~\cite{MR834478}, a vast literature has emerged on the sufficient conditions for such CLTs. To counterbalance this, the present paper provides verifiable necessary conditions for CLTs of ergodic averages of Markov chains on general state spaces. Our theory is based on drift conditions, which also yield lower bounds on the rates of convergence to stationarity in various metrics. The validity of the ergodic CLT is of particular importance for sampling algorithms, where it underpins the error analysis of estimators in Bayesian statistics and machine learning. Although heavy-tailed sampling is of central importance in applications, the characterisation of the CLT and the convergence rates are theoretically poorly understood for almost all practically-used Markov chain Monte Carlo (MCMC) algorithms. In this setting our results provide sharp conditions on the validity of the ergodic CLT and establish convergence rates for large families of MCMC sampling algorithms for heavy-tailed targets. Our study includes a rather complete analyses for random walk Metropolis samplers (with finite- and infinite-variance proposals), Metropolis-adjusted and unadjusted Langevin algorithms and the stereographic projection sampler (as well as the independence sampler). By providing these sharp results via our practical drift conditions, our theory offers significant insights into the problems of algorithm selection and comparison for sampling heavy-tailed distributions (see short YouTube presentations~\cite{YouTube_talk} describing our \href{https://youtu.be/m2y7U4cEqy4}{\underline{theory}} and \href{https://youtu.be/w8I_oOweuko}{\underline{applications}}).

Paper Structure

This paper contains 44 sections, 32 theorems, 203 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem 2.1

Let Assumption sub_drift_conditions hold.

Figures (5)

  • Figure 1: Panels (A), (B) and (C) present QQ-plots of $10^4$ normalised ergodic averages estimating the probability $\pi(\cdot\geq 2)$ via ULA, MALA and RWM with $\alpha$-stable proposal ($\alpha=0.1$), respectively. $\pi$ is a $t$-distribution on ${\mathbb R}$ with $v=1$ degree of freedom, see \ref{['eq:student_t_def']} below. Each ergodic average was computed using $2\cdot 10^8$ steps.
  • Figure 2: QQ-plots of $5\cdot10^4$ ergodic averages $S_n(g)$ (see \ref{['eq:ergodic_average']} above) for the function $g(x)= |x|$ after $n=10^8$ steps of the RWM chains with Gaussian proposals in panel \ref{['fig:RWM_QQ_light_tail_ex']} and $t(0.05)$ proposals in panel \ref{['fig:RWM_QQ_light_mean_ex']} targeting the $t$-distribution $t(3)$ on ${\mathbb R}$. (See \ref{['eq:student_t_def']} for the definition of the $t$-distribution.) The QQ-plots in panels \ref{['fig:RWM_QQ_light_tail_ex']} and \ref{['fig:RWM_QQ_light_mean_ex']} show that the CLT fails in the first and holds in the second case, which agrees with the conclusion of Theorem \ref{['thm:RWM_light']}\ref{['thm:RWM_light_CLT']} since $v-2=1<2=2s<v-0.05 = 2.95$.
  • Figure 3: QQ-plots of $5\cdot10^4$ ergodic averages $S_n(g)$ (see \ref{['eq:ergodic_average']} above) for the function $g(x)= {{\mathbbm{1}}\mkern -1.5mu}{\{|x|\geq 5\}}$ after $n=10^8$ steps of the RWM chains with Gaussian proposals in panel \ref{['fig:RWM_heavy_tail_ex']} and $t(0.05)$ proposals in panel \ref{['fig:RWM_heavy_mean_ex']} targeting the 20-dimensional $t$-distribution $t(1)$. (See \ref{['eq:student_t_def']} for the definition of the $t$-distribution.) The QQ-plots in panels \ref{['fig:RWM_heavy_tail_ex']} and \ref{['fig:RWM_heavy_mean_ex']} show that the CLT fails in the first and holds in the second case, which agrees with the conclusion of Theorem \ref{['thm:RWM_light']}\ref{['thm:RWM_light_CLT']} since $0.05 < v<2$.
  • Figure 4: QQ-plots of $5\cdot10^3$ ergodic averages $S_n(g)$ (see \ref{['eq:ergodic_average']} above) for the function $g(x)= \mathbbm{1}_{[2,\infty)}(|x|)$ after $n=6\cdot10^6$ steps of the SPS targeting $t$-distribution $t(1)$ in \ref{['eq:student_t_def']} in dimension $d=1$ (resp. $d=4$). The QQ-plot in panel \ref{['fig:SPS_dim1_ex']} (resp. \ref{['fig:SPS_dim4_ex']}) shows that CLT holds (resp. fails), which agrees with Theorem \ref{['thm:stereographic']}\ref{['thm:sps_CLT']} since $v=1>1/2=d/2$ (resp. $v=1<2=d/2$).
  • Figure 5: QQ-plots of $10^4$ ergodic averages $S_n(g)$ (see \ref{['eq:ergodic_average']} above) for the function $g(x)=|x|$ after $n=5\cdot10^7$ steps of the ULA chain (resp. MALA chain) targeting the $t$-distribution $t(3)$ on ${\mathbb R}$ in panel \ref{['fig:ULA_mean_ex']} (resp. \ref{['fig:MALA_mean_ex']}). (See \ref{['eq:student_t_def']} for the definition of the $t$-distribution.) The QQ-plots in panels \ref{['fig:ULA_mean_ex']} and \ref{['fig:MALA_mean_ex']} show that the CLTs fail, which agrees with the conclusion of Theorems \ref{['thm:ULA']} and \ref{['thm:MALA_light']} since $v-2=1<2=2s$.

Theorems & Definitions (80)

  • Theorem 2.1
  • Theorem 2.2: Tails of return times & modulated moments
  • Theorem 2.3: Tails of the invariant measure
  • Theorem 2.4: Lower bounds on convergence rates
  • Remark 2.5
  • Lemma 2.6
  • Lemma 2.7
  • Remark 2.8
  • Remark 3.1
  • Theorem 3.2
  • ...and 70 more