Table of Contents
Fetching ...

A Concentration Bound for TD(0) with Function Approximation

Siddharth Chandak, Vivek S. Borkar

TL;DR

This work addresses the challenge of obtaining non-asymptotic, uniform all-time concentration guarantees for online TD$(0)$ with linear function approximation when learning from a single Markov path. It treats TD$(0)$ as a contractive stochastic approximation with both martingale and Markov noise, and employs the Poisson equation to manage Markov noise along with relaxed martingale concentration inequalities to bypass the need for almost-sure boundedness of iterates. The main result provides an explicit all-time bound that decays with time and holds for all $m\ge n_0$ with high probability, with a corollary giving a mean-square rate under a standard $a(n)=d_1/(n+1)$ stepsize. These uniform finite-time guarantees advance the reliability of TD learning with function approximation in online, Markovian settings and offer a framework that could extend to other contractive stochastic approximation algorithms influenced by Markov noise.

Abstract

We derive uniform all-time concentration bound of the type 'for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.

A Concentration Bound for TD(0) with Function Approximation

TL;DR

This work addresses the challenge of obtaining non-asymptotic, uniform all-time concentration guarantees for online TD with linear function approximation when learning from a single Markov path. It treats TD as a contractive stochastic approximation with both martingale and Markov noise, and employs the Poisson equation to manage Markov noise along with relaxed martingale concentration inequalities to bypass the need for almost-sure boundedness of iterates. The main result provides an explicit all-time bound that decays with time and holds for all with high probability, with a corollary giving a mean-square rate under a standard stepsize. These uniform finite-time guarantees advance the reliability of TD learning with function approximation in online, Markovian settings and offer a framework that could extend to other contractive stochastic approximation algorithms influenced by Markov noise.

Abstract

We derive uniform all-time concentration bound of the type 'for all for some ' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.
Paper Structure (15 sections, 6 theorems, 96 equations)

This paper contains 15 sections, 6 theorems, 96 equations.

Key Result

Lemma 1

For any $x,z\in\mathbb{R}^d$, where Moreover, $0<\alpha<1$ and hence the function $\sum_{s\in\mathcal{S}}\pi(s)F(\cdot,s)$ is a contraction.

Theorems & Definitions (14)

  • Lemma 1
  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Corollary 1
  • proof : Proof of Theorem 1.
  • Lemma 2
  • Lemma 3
  • ...and 4 more