Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation

Anupama Sridhar; Alexander Johansen

Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation

Anupama Sridhar, Alexander Johansen

TL;DR

A novel discrete-time coupling that bypasses geometric ergodicity is yielded, yielding the first such guarantee for nonlinear TD(0) under realistic mixing under realistic mixing.

Abstract

Temporal Difference Learning (TD(0)) is fundamental in reinforcement learning, yet its finite-sample behavior under non-i.i.d. data and nonlinear approximation remains unknown. We provide the first high-probability, finite-sample analysis of vanilla TD(0) on polynomially mixing Markov data, assuming only Holder continuity and bounded generalized gradients. This breaks with previous work, which often requires subsampling, projections, or instance-dependent step-sizes. Concretely, for mixing exponent $β> 1$, Holder continuity exponent $γ$, and step-size decay rate $η\in (1/2, 1]$, we show that, with high probability, \[ \| θ_t - θ^* \| \leq C(β, γ, η)\, t^{-β/2} + C'(γ, η)\, t^{-ηγ} \] after $t = \mathcal{O}(1/\varepsilon^2)$ iterations. These bounds match the known i.i.d. rates and hold even when initialization is nonstationary. Central to our proof is a novel discrete-time coupling that bypasses geometric ergodicity, yielding the first such guarantee for nonlinear TD(0) under realistic mixing.

Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation

TL;DR

A novel discrete-time coupling that bypasses geometric ergodicity is yielded, yielding the first such guarantee for nonlinear TD(0) under realistic mixing under realistic mixing.

Abstract

, Holder continuity exponent

, and step-size decay rate

, we show that, with high probability,

after

iterations. These bounds match the known i.i.d. rates and hold even when initialization is nonstationary. Central to our proof is a novel discrete-time coupling that bypasses geometric ergodicity, yielding the first such guarantee for nonlinear TD(0) under realistic mixing.

Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation

TL;DR

Abstract

Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (44)