Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning
R. Srikant
TL;DR
This work delivers non-asymptotic central limit theorems for vector-valued martingales and for functions of Markov chains by combining Lindeberg-style decompositions, Stein's method, and Poisson’s equation. It provides explicit Wasserstein-distance rates that quantify how fast the distributions of normalized sums approach their Gaussian limits and extends these results to Markov chains with both finite and general state spaces. The authors then apply the Markov-chain CLT to Temporal Difference learning with Polyak-Ruppert averaging, deriving a concrete rate for the distributional convergence of the averaged TD iterates under decaying step-sizes. The results offer practical, finite-time normal approximation guarantees for TD learning and potentially other stochastic approximation schemes with Markovian noise, connecting asymptotic variance characterizations to finite-sample performance.
Abstract
We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to establish a non-asymptotic central limit theorem for Temporal Difference (TD) learning with averaging.
