Table of Contents
Fetching ...

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

R. Srikant

TL;DR

This work delivers non-asymptotic central limit theorems for vector-valued martingales and for functions of Markov chains by combining Lindeberg-style decompositions, Stein's method, and Poisson’s equation. It provides explicit Wasserstein-distance rates that quantify how fast the distributions of normalized sums approach their Gaussian limits and extends these results to Markov chains with both finite and general state spaces. The authors then apply the Markov-chain CLT to Temporal Difference learning with Polyak-Ruppert averaging, deriving a concrete rate for the distributional convergence of the averaged TD iterates under decaying step-sizes. The results offer practical, finite-time normal approximation guarantees for TD learning and potentially other stochastic approximation schemes with Markovian noise, connecting asymptotic variance characterizations to finite-sample performance.

Abstract

We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to establish a non-asymptotic central limit theorem for Temporal Difference (TD) learning with averaging.

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

TL;DR

This work delivers non-asymptotic central limit theorems for vector-valued martingales and for functions of Markov chains by combining Lindeberg-style decompositions, Stein's method, and Poisson’s equation. It provides explicit Wasserstein-distance rates that quantify how fast the distributions of normalized sums approach their Gaussian limits and extends these results to Markov chains with both finite and general state spaces. The authors then apply the Markov-chain CLT to Temporal Difference learning with Polyak-Ruppert averaging, deriving a concrete rate for the distributional convergence of the averaged TD iterates under decaying step-sizes. The results offer practical, finite-time normal approximation guarantees for TD learning and potentially other stochastic approximation schemes with Markovian noise, connecting asymptotic variance characterizations to finite-sample performance.

Abstract

We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to establish a non-asymptotic central limit theorem for Temporal Difference (TD) learning with averaging.
Paper Structure (21 sections, 6 theorems, 77 equations)

This paper contains 21 sections, 6 theorems, 77 equations.

Key Result

Lemma 1

Let $g\in Lip_L$ and $f$ be the solution to Stein's equation: where $\Delta$ and $\nabla$ are the Laplacian and gradient operators, respectively, and $Z\sim N(0,I)$. Then, $f$ has the following regularity properties for any $\beta\in (0,1)$: $\diamond$

Theorems & Definitions (10)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • proof