Towards Parameter-Free Temporal Difference Learning

Yunxiang Li; Mark Schmidt; Reza Babanezhad; Sharan Vaswani

Towards Parameter-Free Temporal Difference Learning

Yunxiang Li, Mark Schmidt, Reza Babanezhad, Sharan Vaswani

TL;DR

This work proposes a regularized TD(0) algorithm with an exponential step-size schedule that achieves a comparable convergence rate to prior works, without requiring projections, iterate averaging, or knowledge of $\tau_{\text{mix}}$ or $\omega$.

Abstract

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However, they often require setting the algorithm parameters using problem-dependent quantities that are difficult to estimate in practice -- such as the minimum eigenvalue of the feature covariance ($ω$) or the mixing time of the underlying Markov chain ($τ_{\text{mix}}$). In addition, some analyses rely on nonstandard and impractical modifications, exacerbating the gap between theory and practice. To address these limitations, we use an exponential step-size schedule with the standard TD(0) algorithm. We analyze the resulting method under two sampling regimes: independent and identically distributed (i.i.d.) sampling from the stationary distribution, and the more practical Markovian sampling along a single trajectory. In the i.i.d.\ setting, the proposed algorithm does not require knowledge of problem-dependent quantities such as $ω$, and attains the optimal bias-variance trade-off for the last iterate. In the Markovian setting, we propose a regularized TD(0) algorithm with an exponential step-size schedule. The resulting algorithm achieves a comparable convergence rate to prior works, without requiring projections, iterate averaging, or knowledge of $τ_{\text{mix}}$ or $ω$.

Towards Parameter-Free Temporal Difference Learning

TL;DR

Abstract

) or the mixing time of the underlying Markov chain (

). In addition, some analyses rely on nonstandard and impractical modifications, exacerbating the gap between theory and practice. To address these limitations, we use an exponential step-size schedule with the standard TD(0) algorithm. We analyze the resulting method under two sampling regimes: independent and identically distributed (i.i.d.) sampling from the stationary distribution, and the more practical Markovian sampling along a single trajectory. In the i.i.d.\ setting, the proposed algorithm does not require knowledge of problem-dependent quantities such as

, and attains the optimal bias-variance trade-off for the last iterate. In the Markovian setting, we propose a regularized TD(0) algorithm with an exponential step-size schedule. The resulting algorithm achieves a comparable convergence rate to prior works, without requiring projections, iterate averaging, or knowledge of

Paper Structure (22 sections, 38 theorems, 137 equations, 1 table)

This paper contains 22 sections, 38 theorems, 137 equations, 1 table.

Introduction
Problem Formulation
Markov decision process.
Linear value function approximation.
Exponential step-size schedule.
Exponential step-size with i.i.d. sampling
Handling Markovian sampling
Standard TD(0)
Regularized TD(0)
Technical novelty compared to mitra2025a.
Conclusion
Additional Inequalities Used in the Proofs
$-g$ is $2$-Lipschitz.
$-g_t$ is $2$-Lipschitz.
Equations 7 and 8 in mitra2025a.
...and 7 more sections

Key Result

Lemma 3.0

[Lemma 3 from Bhandari2018AFT] Under the i.i.d. sampling, $\forall w, w^*\in \mathbb{R}^d$,

Theorems & Definitions (65)

Lemma 3.0
Lemma 3.0
Lemma 3.0
Theorem 3.1
Definition 4.0
Lemma 4.0
Lemma 4.0
Lemma 4.0
Lemma 4.0
Lemma 4.0
...and 55 more

Towards Parameter-Free Temporal Difference Learning

TL;DR

Abstract

Towards Parameter-Free Temporal Difference Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (65)