A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation

Aritra Mitra

A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation

Aritra Mitra

TL;DR

The paper addresses finite-time convergence of TD(0) with linear function approximation under Markovian sampling and aims to avoid projection-based analysis. It introduces a two-step inductive method: first establishing uniform boundedness of the iterates for a constant step-size $α$, then deriving a recursion for $d_t=\mathbb{E}\|θ_t-θ^*\|^2$ in which Markov noise appears as an $O(α^2)$ perturbation to the steady-state dynamics, yielding exponential convergence to a ball of radius $O(α)$. A key result shows that with α constrained by the mixing time $τ$ (e.g., $α ≤ ω(1-γ)/(Cτ)$), all iterates remain bounded and the recurrence contracts up to a small disturbance, leading to a simple, projection-free finite-time guarantee. The analysis extends to TD($λ$) and Q-learning variants under standard Lipschitz and contractivity conditions, and to nonlinear stochastic approximation within Markov noise, offering a robust blueprint for handling perturbations such as delays. This framework improves conceptual clarity, enables approximate step-size design from mixing-time estimates, and suggests avenues for extending to neural function approximators and multi-agent settings.

Abstract

We study the finite-time convergence of TD learning with linear function approximation under Markovian sampling. Existing proofs for this setting either assume a projection step in the algorithm to simplify the analysis, or require a fairly intricate argument to ensure stability of the iterates. We ask: \textit{Is it possible to retain the simplicity of a projection-based analysis without actually performing a projection step in the algorithm?} Our main contribution is to show this is possible via a novel two-step argument. In the first step, we use induction to prove that under a standard choice of a constant step-size $α$, the iterates generated by TD learning remain uniformly bounded in expectation. In the second step, we establish a recursion that mimics the steady-state dynamics of TD learning up to a bounded perturbation on the order of $O(α^2)$ that captures the effect of Markovian sampling. Combining these pieces leads to an overall approach that considerably simplifies existing proofs. We conjecture that our inductive proof technique will find applications in the analyses of more complex stochastic approximation algorithms, and conclude by providing some examples of such applications.

A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation

TL;DR

, then deriving a recursion for

in which Markov noise appears as an

perturbation to the steady-state dynamics, yielding exponential convergence to a ball of radius

. A key result shows that with α constrained by the mixing time

(e.g.,

), all iterates remain bounded and the recurrence contracts up to a small disturbance, leading to a simple, projection-free finite-time guarantee. The analysis extends to TD(

) and Q-learning variants under standard Lipschitz and contractivity conditions, and to nonlinear stochastic approximation within Markov noise, offering a robust blueprint for handling perturbations such as delays. This framework improves conceptual clarity, enables approximate step-size design from mixing-time estimates, and suggests avenues for extending to neural function approximators and multi-agent settings.

Abstract

, the iterates generated by TD learning remain uniformly bounded in expectation. In the second step, we establish a recursion that mimics the steady-state dynamics of TD learning up to a bounded perturbation on the order of

that captures the effect of Markovian sampling. Combining these pieces leads to an overall approach that considerably simplifies existing proofs. We conjecture that our inductive proof technique will find applications in the analyses of more complex stochastic approximation algorithms, and conclude by providing some examples of such applications.

Paper Structure (6 sections, 9 theorems, 54 equations)

This paper contains 6 sections, 9 theorems, 54 equations.

Introduction
Background on TD Learning
Convergence Analysis
Applications of our Analysis Technique
Conclusion
Omitted Proofs

Key Result

Lemma 1

The following holds $\forall \theta \in \mathbb{R}^K$: where $\omega$ is the smallest eigenvalue of the matrix $\Sigma = \Phi^\top D \Phi$.

Theorems & Definitions (17)

Definition 1
Lemma 1
Lemma 2
proof
Lemma 3
proof
Lemma 4
proof
Theorem 1
proof
...and 7 more

A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation

TL;DR

Abstract

A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (17)