Contraction of Markovian Operators in Orlicz Spaces and Error Bounds for Markov Chain Monte Carlo
Amedeo Roberto Esposito, Marco Mondelli
TL;DR
This work introduces a convergence framework for Markovian operators in Orlicz spaces, extending beyond traditional $L_p$ analyses. It establishes a general, closed-form bound on the contraction coefficient via duality with the kernel $K^\star$ and densities $g_X,g_Y$, enabling convergence analysis without spectral gaps and accommodating heavy-tailed stationary distributions. The key theoretical contribution is a main contraction theorem in Orlicz spaces, which recovers classical results on ergodicity and mixing, sharpens bounds for strong data-processing inequalities, and yields improved mixing-time, burn-in, and concentration guarantees for Markov-chain Monte Carlo. The approach offers practical benefits in dependent-measure concentration and has potential applications to heavy-tailed settings and bandit problems with Markovian rewards, representing a significant methodological advance in the analysis of Markov processes.
Abstract
We introduce a novel concept of convergence for Markovian processes within Orlicz spaces, extending beyond the conventional approach associated with $L_p$ spaces. After showing that Markovian operators are contractive in Orlicz spaces, our key technical contribution is an upper bound on their contraction coefficient, which admits a closed-form expression. The bound is tight in some settings, and it recovers well-known results, such as the connection between contraction and ergodicity, ultra-mixing and Doeblin's minorisation. Specialising our approach to $L_p$ spaces leads to a significant improvement upon classical Riesz-Thorin's interpolation methods. Furthermore, by exploiting the flexibility offered by Orlicz spaces, we can tackle settings where the stationary distribution is heavy-tailed, a severely under-studied setup. The technical tools introduced lend themselves to providing novel bounds on the contraction coefficient (SDPI constant) of information-theoretic divergences. We thus provide a variety of examples in which we show an improvement over the state of the art. As an application of the framework put forward in the paper, we introduce tighter bounds on the mixing time of Markovian processes, better exponential concentration bounds for MCMC methods, and better lower bounds on the burn-in period. To conclude, we show how our results can be used to prove the concentration of measure phenomenon for a sequence of Markovian random variables.
