The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Shuze Daniel Liu; Shuhang Chen; Shangtong Zhang

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang

TL;DR

The paper extends the Borkar–Meyn stability framework from Martingale-difference noise to Markovian noise in stochastic approximation, addressing stability and convergence for RL algorithms with off-policy data and eligibility traces. By introducing a diminishing asymptotic rate of change and leveraging a subsequence/Arzelà–Ascoli analysis, it proves almost-sure boundedness and convergence to invariant sets of the limiting ODE, under comparatively weak assumptions. The results yield direct, almost-sure convergence guarantees for GTD(λ) and ETD(λ) in off-policy RL, even with unbounded traces, reducing reliance on projections or restrictive drift conditions. This framework broadens applicability to RL with linear function approximation and provides a principled path for future work on rates, CLTs, and further relaxations.

Abstract

Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of the strong law of large numbers and a form of the law of the iterated logarithm.

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

TL;DR

Abstract

Paper Structure (16 sections, 20 theorems, 105 equations)

This paper contains 16 sections, 20 theorems, 105 equations.

Introduction
Main Results
Related Work
Proof of Theorem \ref{['thm: stability']}
Diminishing Asymptotic Rate of Change
Equicontinuity of Scaled Iterates
A Convergent Subsequence
Diminishing Discretization Error
Identifying Contradiction and Completing Proof
Applications in Reinforcement Learning
Eligibility Trace
The Deadly Triad
Gradient Temporal Difference Learning
Emphatic Temporal Difference Learning
Conclusion
...and 1 more sections

Key Result

Theorem 7

Let Assumptions assumption: stationary distribution - assumption: lim h uniformly convergent hold. Let Assumption assumption: lln or assumption possion hold. Then the iterates $\qty{x_n}$ generated by eq: x n updates are stable, i.e.,

Theorems & Definitions (30)

Remark 1
Remark 2
Remark 3
Remark 4
Remark 5
Remark 6
Theorem 7
Corollary 8
Lemma 9
Lemma 10
...and 20 more

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

TL;DR

Abstract

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (30)