Table of Contents
Fetching ...

Optimistic Estimation of Convergence in Markov Chains with the Average-Mixing Time

Geoffrey Wolfer, Pierre Alquier

TL;DR

The paper argues that the conventional worst-case mixing time $t_{\mathsf{mix}}$ is often pessimistic and hard to estimate from data. It advocates the average-mixing time $t^{\sharp}_{\mathsf{mix}}(\xi)$, defined via the stationary $\beta$-mixing coefficients, as an optimistic and estimable proxy for convergence of Markov chains, applicable to finite and countable state spaces. The authors develop single-trajectory estimators for the $\beta$-mixing coefficients and $t^{\sharp}_{\mathsf{mix}}(\xi)$, with explicit finite-sample guarantees under sub-exponential and polynomial mixing, and provide spectral, ergodic, and graph-structure-informed bounds. They also demonstrate implications across state-space scales, including finite spaces, countable spaces, and infinite graphs with controlled growth, and illustrate potential gaps between worst-case and average convergence using a two-point space example. Overall, the work offers a practical framework for data-driven convergence assessment and provides a pathway to more efficient probabilistic guarantees in learning and MCMC diagnostics with weak dependencies.

Abstract

The convergence rate of a Markov chain to its stationary distribution is typically assessed using the concept of total variation mixing time. However, this worst-case measure often yields pessimistic estimates and is challenging to infer from observations. In this paper, we advocate for the use of the average-mixing time as a more optimistic and demonstrably easier-to-estimate alternative. We further illustrate its applicability across a range of settings, from two-point to countable spaces, and discuss some practical implications.

Optimistic Estimation of Convergence in Markov Chains with the Average-Mixing Time

TL;DR

The paper argues that the conventional worst-case mixing time is often pessimistic and hard to estimate from data. It advocates the average-mixing time , defined via the stationary -mixing coefficients, as an optimistic and estimable proxy for convergence of Markov chains, applicable to finite and countable state spaces. The authors develop single-trajectory estimators for the -mixing coefficients and , with explicit finite-sample guarantees under sub-exponential and polynomial mixing, and provide spectral, ergodic, and graph-structure-informed bounds. They also demonstrate implications across state-space scales, including finite spaces, countable spaces, and infinite graphs with controlled growth, and illustrate potential gaps between worst-case and average convergence using a two-point space example. Overall, the work offers a practical framework for data-driven convergence assessment and provides a pathway to more efficient probabilistic guarantees in learning and MCMC diagnostics with weak dependencies.

Abstract

The convergence rate of a Markov chain to its stationary distribution is typically assessed using the concept of total variation mixing time. However, this worst-case measure often yields pessimistic estimates and is challenging to infer from observations. In this paper, we advocate for the use of the average-mixing time as a more optimistic and demonstrably easier-to-estimate alternative. We further illustrate its applicability across a range of settings, from two-point to countable spaces, and discuss some practical implications.
Paper Structure (41 sections, 25 theorems, 236 equations, 5 figures)

This paper contains 41 sections, 25 theorems, 236 equations, 5 figures.

Key Result

Proposition 2.1

Let $M \in \mathbb{R}_+$ be arbitrarily large. There exists a transition operator $P$ such that for any $\xi \in (0,1)$, it holds that

Figures (5)

  • Figure 1: Logical flow between the estimation results of Section \ref{['section:estimation']}. MAD: Mean Absolute Deviation; PAC: Probably Approximately Correct.
  • Figure 2: Average-mixing time $\xi(1 \pm \varepsilon)$ band.
  • Figure 3: Transient regime for the entropic term $\mathcal{J}_p^{(s)}$.
  • Figure 4: Relation between the various ergodic conditions in Subsection \ref{['subsec:implications-ergodic']}. In this diagram, $\Rightarrow$ is to be read as "implies under the assumptions of the result referred to".
  • Figure 5: Decomposing the process into Bernstein blocks.

Theorems & Definitions (47)

  • Remark 2.1
  • Remark 2.2: Average-mixing time is stationary $\beta$-mixing time
  • Definition 2.1
  • Proposition 2.1
  • proof
  • Example 2.1: Birth and death Markov chains
  • Lemma 2.1: Bounding deviation of additive functionals evaluated on Markov chains
  • proof : Proof sketch
  • Lemma 2.2
  • Theorem 3.1: Mean Absolute Deviation
  • ...and 37 more