Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise

Ethan Blaser; Shangtong Zhang

Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise

Ethan Blaser, Shangtong Zhang

TL;DR

This work investigates stochastic approximations with merely nonexpansive operators, and proves for the first time that classical tabular average reward temporal difference learning converges to a sample-path dependent fixed point.

Abstract

Stochastic approximation is a powerful class of algorithms with celebrated success. However, a large body of previous analysis focuses on stochastic approximations driven by contractive operators, which is not applicable in some important reinforcement learning settings like the average reward setting. This work instead investigates stochastic approximations with merely nonexpansive operators. In particular, we study nonexpansive stochastic approximations with Markovian noise, providing both asymptotic and finite sample analysis. Key to our analysis are novel bounds of noise terms resulting from the Poisson equation. As an application, we prove for the first time that classical tabular average reward temporal difference learning converges to a sample-path dependent fixed point.

Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise

TL;DR

Abstract

Paper Structure (29 sections, 29 theorems, 190 equations, 1 table)

This paper contains 29 sections, 29 theorems, 190 equations, 1 table.

Introduction
Notations
Asymptotic Analysis of SKM Iterations
Finite Sample Analysis of SKM Iterations
Application in Average Reward Temporal Difference Learning
Reinforcement Learning Background
Average Reward Temporal Difference Learning
Significance of Theorem 4.2
Related Work
ODE and Lyapunov Methods for Asymptotic Convergence
Average Reward RL
Conclusion
Acknowledgments
Mathematical Background
Additional Lemmas from Section \ref{['sec:SA']}
...and 14 more sections

Key Result

Theorem 2.6

Let Assumptions as:steadystate - as:e1 hold. Then the iterates $\qty{x_n}$ generated by eq:skm_markov satisfy where $x_* \in \mathcal{X}_*$ is a possibly sample-path dependent fixed point. Or more precisely speaking, let $\omega$ denote a sample path $(w_0, Y_0, Y_1, \dots)$ and write $x_n(\omega)$ to emphasize the dependence of $x_n$ on $\omega$. Then there exists a set $\Omega$ of sample paths

Theorems & Definitions (63)

Theorem 2.6
proof
Remark 2.7
Theorem 3.1
proof
Remark 3.2
Theorem 4.2
proof
Remark 4.3
Lemma A.1: Theorem 2.1 from bravo2024stochastic
...and 53 more

Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise

TL;DR

Abstract

Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (63)