Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

Jiuqi Wang; Shangtong Zhang

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

Jiuqi Wang, Shangtong Zhang

TL;DR

This work is the first to establish the almost sure convergence of linear TD without requiring linearly independent features, and introduces a novel characterization of bounded invariant sets of the mean ODE of linear TD.

Abstract

Temporal difference (TD) learning with linear function approximation, abbreviated as linear TD, is a classic and powerful prediction algorithm in reinforcement learning. While it is well understood that linear TD converges almost surely to a unique point, this convergence traditionally requires the assumption that the features used by the approximator are linearly independent. However, this linear independence assumption does not hold in many practical scenarios. This work is the first to establish the almost sure convergence of linear TD without requiring linearly independent features. In fact, we do not make any assumptions on the features. We prove that the approximated value function converges to a unique point and the weight iterates converge to a set. We also establish a notion of local stability of the weight iterates. Importantly, we do not need to introduce any other additional assumptions and do not need to make any modification to the linear TD algorithm. Key to our analysis is a novel characterization of bounded invariant sets of the mean ODE of linear TD.

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

TL;DR

Abstract

Paper Structure (26 sections, 20 theorems, 123 equations)

This paper contains 26 sections, 20 theorems, 123 equations.

Introduction
Background
TD Fixed Points
ODE Solutions
Value Convergence
Weight Convergence
Bounded Invariant Sets
Convergence of Linear TD
Related Work
Conclusion
Mathematical Background
Proofs in Section \ref{['section: TD fixed points']}
Proof of Lemma \ref{['lemma: projection matrix']}
Proof of Lemma \ref{['lemma: contraction operator']}
Proof of Lemma \ref{['lemma: value equivalence for TD fixed points']}
...and 11 more sections

Key Result

Lemma 1

Let Assumption assumption: irreducible hold. Let $(\cdot)^\dagger$ denote the pseudo-inverse. Then,

Theorems & Definitions (21)

Lemma 1
Lemma 2
Lemma 3
Theorem 1
Theorem 2
Lemma 4
Theorem 3
Lemma 5
Lemma 6
Lemma 7
...and 11 more

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

TL;DR

Abstract

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (21)