Table of Contents
Fetching ...

Towards Parameter-Free Temporal Difference Learning

Yunxiang Li, Mark Schmidt, Reza Babanezhad, Sharan Vaswani

TL;DR

This work proposes a regularized TD(0) algorithm with an exponential step-size schedule that achieves a comparable convergence rate to prior works, without requiring projections, iterate averaging, or knowledge of \(\tau_{\text{mix}}\) or \(\omega\).

Abstract

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However, they often require setting the algorithm parameters using problem-dependent quantities that are difficult to estimate in practice -- such as the minimum eigenvalue of the feature covariance (\(ω\)) or the mixing time of the underlying Markov chain (\(τ_{\text{mix}}\)). In addition, some analyses rely on nonstandard and impractical modifications, exacerbating the gap between theory and practice. To address these limitations, we use an exponential step-size schedule with the standard TD(0) algorithm. We analyze the resulting method under two sampling regimes: independent and identically distributed (i.i.d.) sampling from the stationary distribution, and the more practical Markovian sampling along a single trajectory. In the i.i.d.\ setting, the proposed algorithm does not require knowledge of problem-dependent quantities such as \(ω\), and attains the optimal bias-variance trade-off for the last iterate. In the Markovian setting, we propose a regularized TD(0) algorithm with an exponential step-size schedule. The resulting algorithm achieves a comparable convergence rate to prior works, without requiring projections, iterate averaging, or knowledge of \(τ_{\text{mix}}\) or \(ω\).

Towards Parameter-Free Temporal Difference Learning

TL;DR

This work proposes a regularized TD(0) algorithm with an exponential step-size schedule that achieves a comparable convergence rate to prior works, without requiring projections, iterate averaging, or knowledge of or .

Abstract

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However, they often require setting the algorithm parameters using problem-dependent quantities that are difficult to estimate in practice -- such as the minimum eigenvalue of the feature covariance () or the mixing time of the underlying Markov chain (). In addition, some analyses rely on nonstandard and impractical modifications, exacerbating the gap between theory and practice. To address these limitations, we use an exponential step-size schedule with the standard TD(0) algorithm. We analyze the resulting method under two sampling regimes: independent and identically distributed (i.i.d.) sampling from the stationary distribution, and the more practical Markovian sampling along a single trajectory. In the i.i.d.\ setting, the proposed algorithm does not require knowledge of problem-dependent quantities such as , and attains the optimal bias-variance trade-off for the last iterate. In the Markovian setting, we propose a regularized TD(0) algorithm with an exponential step-size schedule. The resulting algorithm achieves a comparable convergence rate to prior works, without requiring projections, iterate averaging, or knowledge of or .
Paper Structure (22 sections, 38 theorems, 137 equations, 1 table)

This paper contains 22 sections, 38 theorems, 137 equations, 1 table.

Key Result

Lemma 3.0

[Lemma 3 from Bhandari2018AFT] Under the i.i.d. sampling, $\forall w, w^*\in \mathbb{R}^d$,

Theorems & Definitions (65)

  • Lemma 3.0
  • Lemma 3.0
  • Lemma 3.0
  • Theorem 3.1
  • Definition 4.0
  • Lemma 4.0
  • Lemma 4.0
  • Lemma 4.0
  • Lemma 4.0
  • Lemma 4.0
  • ...and 55 more