Table of Contents
Fetching ...

Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation

Anupama Sridhar, Alexander Johansen

TL;DR

A novel discrete-time coupling that bypasses geometric ergodicity is yielded, yielding the first such guarantee for nonlinear TD(0) under realistic mixing under realistic mixing.

Abstract

Temporal Difference Learning (TD(0)) is fundamental in reinforcement learning, yet its finite-sample behavior under non-i.i.d. data and nonlinear approximation remains unknown. We provide the first high-probability, finite-sample analysis of vanilla TD(0) on polynomially mixing Markov data, assuming only Holder continuity and bounded generalized gradients. This breaks with previous work, which often requires subsampling, projections, or instance-dependent step-sizes. Concretely, for mixing exponent $β> 1$, Holder continuity exponent $γ$, and step-size decay rate $η\in (1/2, 1]$, we show that, with high probability, \[ \| θ_t - θ^* \| \leq C(β, γ, η)\, t^{-β/2} + C'(γ, η)\, t^{-ηγ} \] after $t = \mathcal{O}(1/\varepsilon^2)$ iterations. These bounds match the known i.i.d. rates and hold even when initialization is nonstationary. Central to our proof is a novel discrete-time coupling that bypasses geometric ergodicity, yielding the first such guarantee for nonlinear TD(0) under realistic mixing.

Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation

TL;DR

A novel discrete-time coupling that bypasses geometric ergodicity is yielded, yielding the first such guarantee for nonlinear TD(0) under realistic mixing under realistic mixing.

Abstract

Temporal Difference Learning (TD(0)) is fundamental in reinforcement learning, yet its finite-sample behavior under non-i.i.d. data and nonlinear approximation remains unknown. We provide the first high-probability, finite-sample analysis of vanilla TD(0) on polynomially mixing Markov data, assuming only Holder continuity and bounded generalized gradients. This breaks with previous work, which often requires subsampling, projections, or instance-dependent step-sizes. Concretely, for mixing exponent , Holder continuity exponent , and step-size decay rate , we show that, with high probability, after iterations. These bounds match the known i.i.d. rates and hold even when initialization is nonstationary. Central to our proof is a novel discrete-time coupling that bypasses geometric ergodicity, yielding the first such guarantee for nonlinear TD(0) under realistic mixing.

Paper Structure

This paper contains 124 sections, 14 theorems, 265 equations, 1 table.

Key Result

Lemma 4.3

Suppose Assumptions ass:polynomial-mixing and ass:drift-condition hold. Then the Markov chain $\{x_t\}$ is polynomially ergodic. In particular, there exists a constant $C > 0$ such that for all $x_0 \in \mathcal{S}$ and measurable sets $A \subseteq \mathcal{S}$, where $\pi$ denotes the unique stationary distribution of the chain.

Theorems & Definitions (44)

  • Lemma 4.3: Polynomial Ergodicity
  • proof
  • Lemma 5.1: Dependent Blocks
  • proof
  • Lemma 5.2: Coupling Under Polynomial Ergodicity
  • proof
  • Lemma 5.3: Covariance Between Blocks
  • proof
  • Lemma 5.4: Concentration Under Polynomial Ergodicity
  • proof
  • ...and 34 more