Table of Contents
Fetching ...

Contraction of Markovian Operators in Orlicz Spaces and Error Bounds for Markov Chain Monte Carlo

Amedeo Roberto Esposito, Marco Mondelli

TL;DR

This work introduces a convergence framework for Markovian operators in Orlicz spaces, extending beyond traditional $L_p$ analyses. It establishes a general, closed-form bound on the contraction coefficient via duality with the kernel $K^\star$ and densities $g_X,g_Y$, enabling convergence analysis without spectral gaps and accommodating heavy-tailed stationary distributions. The key theoretical contribution is a main contraction theorem in Orlicz spaces, which recovers classical results on ergodicity and mixing, sharpens bounds for strong data-processing inequalities, and yields improved mixing-time, burn-in, and concentration guarantees for Markov-chain Monte Carlo. The approach offers practical benefits in dependent-measure concentration and has potential applications to heavy-tailed settings and bandit problems with Markovian rewards, representing a significant methodological advance in the analysis of Markov processes.

Abstract

We introduce a novel concept of convergence for Markovian processes within Orlicz spaces, extending beyond the conventional approach associated with $L_p$ spaces. After showing that Markovian operators are contractive in Orlicz spaces, our key technical contribution is an upper bound on their contraction coefficient, which admits a closed-form expression. The bound is tight in some settings, and it recovers well-known results, such as the connection between contraction and ergodicity, ultra-mixing and Doeblin's minorisation. Specialising our approach to $L_p$ spaces leads to a significant improvement upon classical Riesz-Thorin's interpolation methods. Furthermore, by exploiting the flexibility offered by Orlicz spaces, we can tackle settings where the stationary distribution is heavy-tailed, a severely under-studied setup. The technical tools introduced lend themselves to providing novel bounds on the contraction coefficient (SDPI constant) of information-theoretic divergences. We thus provide a variety of examples in which we show an improvement over the state of the art. As an application of the framework put forward in the paper, we introduce tighter bounds on the mixing time of Markovian processes, better exponential concentration bounds for MCMC methods, and better lower bounds on the burn-in period. To conclude, we show how our results can be used to prove the concentration of measure phenomenon for a sequence of Markovian random variables.

Contraction of Markovian Operators in Orlicz Spaces and Error Bounds for Markov Chain Monte Carlo

TL;DR

This work introduces a convergence framework for Markovian operators in Orlicz spaces, extending beyond traditional analyses. It establishes a general, closed-form bound on the contraction coefficient via duality with the kernel and densities , enabling convergence analysis without spectral gaps and accommodating heavy-tailed stationary distributions. The key theoretical contribution is a main contraction theorem in Orlicz spaces, which recovers classical results on ergodicity and mixing, sharpens bounds for strong data-processing inequalities, and yields improved mixing-time, burn-in, and concentration guarantees for Markov-chain Monte Carlo. The approach offers practical benefits in dependent-measure concentration and has potential applications to heavy-tailed settings and bandit problems with Markovian rewards, representing a significant methodological advance in the analysis of Markov processes.

Abstract

We introduce a novel concept of convergence for Markovian processes within Orlicz spaces, extending beyond the conventional approach associated with spaces. After showing that Markovian operators are contractive in Orlicz spaces, our key technical contribution is an upper bound on their contraction coefficient, which admits a closed-form expression. The bound is tight in some settings, and it recovers well-known results, such as the connection between contraction and ergodicity, ultra-mixing and Doeblin's minorisation. Specialising our approach to spaces leads to a significant improvement upon classical Riesz-Thorin's interpolation methods. Furthermore, by exploiting the flexibility offered by Orlicz spaces, we can tackle settings where the stationary distribution is heavy-tailed, a severely under-studied setup. The technical tools introduced lend themselves to providing novel bounds on the contraction coefficient (SDPI constant) of information-theoretic divergences. We thus provide a variety of examples in which we show an improvement over the state of the art. As an application of the framework put forward in the paper, we introduce tighter bounds on the mixing time of Markovian processes, better exponential concentration bounds for MCMC methods, and better lower bounds on the burn-in period. To conclude, we show how our results can be used to prove the concentration of measure phenomenon for a sequence of Markovian random variables.
Paper Structure (18 sections, 22 theorems, 161 equations, 9 figures)

This paper contains 18 sections, 22 theorems, 161 equations, 9 figures.

Key Result

Lemma 1

Let $\mu$ be a positive measure, $\nu$ a positive measure s.t. $\nu\ll\mu$, and $f$ the Radon-Nikodym derivative $\frac{d\nu}{d\mu}$. Let $K$ be a Markov kernel and $g=\frac{d\nu K}{d\mu K}$. Then, where $K_\mu^\star$ is the operator such that, given two functions $f,g$, one has that $\langle Kh, f \rangle_\mu = \langle h, K_\mu^\star f\rangle.$

Figures (9)

  • Figure 1: Comparison between the bounds of \ref{['eq:boundSteinInterp', 'eq:ourBoundSemigroup']} when applied to 8 randomly generated stochastic matrices of dimension $2 \times 2$. We set $t_\infty=2$ and the bounds are computed as a function of $t<t_\infty$.
  • Figure 2: Behaviour of sdpiRaginsky and \ref{['eq:boundEtaChiGraphPath']}, as a function of $\lambda$.
  • Figure 3: Comparison between \ref{['eq:boundEtaKLBinary']} and sdpiRaginsky for two different choices of $p$ and as a function of $q\leq p$. We set $\lambda=\kappa=0.1$.
  • Figure 4: Behaviour of \ref{['eq:boundKLGraphOurs']} and \ref{['eq:boundKLGraphRaginsky']}, as a function of $\lambda$, for different choices of $|V|$. We only consider ranges of values of $\lambda$ for which at least one of the bounds is non-trivial (less or equal to $1$). The distribution $\nu$ is generated randomly from the corresponding simplex.
  • Figure 5: Bound on the contraction coefficient given by \ref{['thm:alphaNormsContraction', 'eq:contractionMarkovChainLalpha']} in $100$ randomly generated $5\times 5$ stochastic matrices, for $p=100$ and $t=10$.
  • ...and 4 more figures

Theorems & Definitions (59)

  • Definition 1: Markov kernel
  • Lemma 1
  • Lemma 2
  • proof
  • Proposition 1
  • Definition 2: Young Function, theoryOrliczSpaces
  • Definition 3: theoryOrliczSpaces
  • Definition 4
  • Theorem 1
  • Theorem 2
  • ...and 49 more