Optimistic Estimation of Convergence in Markov Chains with the Average-Mixing Time

Geoffrey Wolfer; Pierre Alquier

Optimistic Estimation of Convergence in Markov Chains with the Average-Mixing Time

Geoffrey Wolfer, Pierre Alquier

TL;DR

The paper argues that the conventional worst-case mixing time $t_{\mathsf{mix}}$ is often pessimistic and hard to estimate from data. It advocates the average-mixing time $t^{\sharp}_{\mathsf{mix}}(\xi)$, defined via the stationary $\beta$-mixing coefficients, as an optimistic and estimable proxy for convergence of Markov chains, applicable to finite and countable state spaces. The authors develop single-trajectory estimators for the $\beta$-mixing coefficients and $t^{\sharp}_{\mathsf{mix}}(\xi)$, with explicit finite-sample guarantees under sub-exponential and polynomial mixing, and provide spectral, ergodic, and graph-structure-informed bounds. They also demonstrate implications across state-space scales, including finite spaces, countable spaces, and infinite graphs with controlled growth, and illustrate potential gaps between worst-case and average convergence using a two-point space example. Overall, the work offers a practical framework for data-driven convergence assessment and provides a pathway to more efficient probabilistic guarantees in learning and MCMC diagnostics with weak dependencies.

Abstract

The convergence rate of a Markov chain to its stationary distribution is typically assessed using the concept of total variation mixing time. However, this worst-case measure often yields pessimistic estimates and is challenging to infer from observations. In this paper, we advocate for the use of the average-mixing time as a more optimistic and demonstrably easier-to-estimate alternative. We further illustrate its applicability across a range of settings, from two-point to countable spaces, and discuss some practical implications.

Optimistic Estimation of Convergence in Markov Chains with the Average-Mixing Time

TL;DR

The paper argues that the conventional worst-case mixing time

is often pessimistic and hard to estimate from data. It advocates the average-mixing time

, defined via the stationary

-mixing coefficients, as an optimistic and estimable proxy for convergence of Markov chains, applicable to finite and countable state spaces. The authors develop single-trajectory estimators for the

-mixing coefficients and

, with explicit finite-sample guarantees under sub-exponential and polynomial mixing, and provide spectral, ergodic, and graph-structure-informed bounds. They also demonstrate implications across state-space scales, including finite spaces, countable spaces, and infinite graphs with controlled growth, and illustrate potential gaps between worst-case and average convergence using a two-point space example. Overall, the work offers a practical framework for data-driven convergence assessment and provides a pathway to more efficient probabilistic guarantees in learning and MCMC diagnostics with weak dependencies.

Abstract

Paper Structure (41 sections, 25 theorems, 236 equations, 5 figures)

This paper contains 41 sections, 25 theorems, 236 equations, 5 figures.

Introduction
Related work
Estimation of mixing parameters in Markov chains
Average-mixing time
Main contributions
Highlight the significance of the average-mixing time
Estimation of the average-mixing time
Notation and setting
Outline
The average-mixing time
Average-mixing versus worst-case mixing
Deviation of additive functionals of Markov chains
Spectral methods under reversibility and geometric ergodicity
Estimation of average convergence from a single trajectory
Estimation of ・趣ｽｲ-mixing coefficients
...and 26 more sections

Key Result

Proposition 2.1

Let $M \in \mathbb{R}_+$ be arbitrarily large. There exists a transition operator $P$ such that for any $\xi \in (0,1)$, it holds that

Figures (5)

Figure 1: Logical flow between the estimation results of Section \ref{['section:estimation']}. MAD: Mean Absolute Deviation; PAC: Probably Approximately Correct.
Figure 2: Average-mixing time $\xi(1 \pm \varepsilon)$ band.
Figure 3: Transient regime for the entropic term $\mathcal{J}_p^{(s)}$.
Figure 4: Relation between the various ergodic conditions in Subsection \ref{['subsec:implications-ergodic']}. In this diagram, $\Rightarrow$ is to be read as "implies under the assumptions of the result referred to".
Figure 5: Decomposing the process into Bernstein blocks.

Theorems & Definitions (47)

Remark 2.1
Remark 2.2: Average-mixing time is stationary $\beta$-mixing time
Definition 2.1
Proposition 2.1
proof
Example 2.1: Birth and death Markov chains
Lemma 2.1: Bounding deviation of additive functionals evaluated on Markov chains
proof : Proof sketch
Lemma 2.2
Theorem 3.1: Mean Absolute Deviation
...and 37 more

Optimistic Estimation of Convergence in Markov Chains with the Average-Mixing Time

TL;DR

Abstract

Optimistic Estimation of Convergence in Markov Chains with the Average-Mixing Time

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (47)