A Concentration Bound for TD(0) with Function Approximation

Siddharth Chandak; Vivek S. Borkar

A Concentration Bound for TD(0) with Function Approximation

Siddharth Chandak, Vivek S. Borkar

TL;DR

This work addresses the challenge of obtaining non-asymptotic, uniform all-time concentration guarantees for online TD$(0)$ with linear function approximation when learning from a single Markov path. It treats TD$(0)$ as a contractive stochastic approximation with both martingale and Markov noise, and employs the Poisson equation to manage Markov noise along with relaxed martingale concentration inequalities to bypass the need for almost-sure boundedness of iterates. The main result provides an explicit all-time bound that decays with time and holds for all $m\ge n_0$ with high probability, with a corollary giving a mean-square rate under a standard $a(n)=d_1/(n+1)$ stepsize. These uniform finite-time guarantees advance the reliability of TD learning with function approximation in online, Markovian settings and offer a framework that could extend to other contractive stochastic approximation algorithms influenced by Markov noise.

Abstract

We derive uniform all-time concentration bound of the type 'for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.

A Concentration Bound for TD(0) with Function Approximation

TL;DR

This work addresses the challenge of obtaining non-asymptotic, uniform all-time concentration guarantees for online TD

with linear function approximation when learning from a single Markov path. It treats TD

as a contractive stochastic approximation with both martingale and Markov noise, and employs the Poisson equation to manage Markov noise along with relaxed martingale concentration inequalities to bypass the need for almost-sure boundedness of iterates. The main result provides an explicit all-time bound that decays with time and holds for all

with high probability, with a corollary giving a mean-square rate under a standard

stepsize. These uniform finite-time guarantees advance the reliability of TD learning with function approximation in online, Markovian settings and offer a framework that could extend to other contractive stochastic approximation algorithms influenced by Markov noise.

Abstract

We derive uniform all-time concentration bound of the type 'for all

for some

' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.

Paper Structure (15 sections, 6 theorems, 96 equations)

This paper contains 15 sections, 6 theorems, 96 equations.

Introduction
Related Works
Outline and Notation
Background on TD(0)
Assumptions
Formulation as a Stochastic Approximation Iteration
Main Result
Proof of the Main Result
Conclusions
Appendix A: A Martingale Inequality
Appendix B: Technical Proofs
Proof of Lemma \ref{['lemma:TD0-contraction']}
Proof of Lemma \ref{['lemma:U_W']}
Proof of Lemma \ref{['lemma:bound_p_m']}
Proof of Corollary \ref{['coro']}

Key Result

Lemma 1

For any $x,z\in\mathbb{R}^d$, where Moreover, $0<\alpha<1$ and hence the function $\sum_{s\in\mathcal{S}}\pi(s)F(\cdot,s)$ is a contraction.

Theorems & Definitions (14)

Lemma 1
Theorem 1
Remark 1
Remark 2
Remark 3
Remark 4
Corollary 1
proof : Proof of Theorem 1.
Lemma 2
Lemma 3
...and 4 more

A Concentration Bound for TD(0) with Function Approximation

TL;DR

Abstract

A Concentration Bound for TD(0) with Function Approximation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (14)