Table of Contents
Fetching ...

Asynchronous Stochastic Approximation with Applications to Average-Reward Reinforcement Learning

Huizhen Yu, Yi Wan, Richard S. Sutton

TL;DR

This work addresses stability and convergence of asynchronous stochastic approximation in the context of average-reward reinforcement learning. It extends the Borkar–Meyn stability framework to general noise via stopping-time methods and proves convergence of asynchronous SA by linking iterates to ODE dynamics, then sharpens these results through a shadowing analysis based on Hirsch–Benaïm theory to identify conditions under which trajectories shadow a unique equilibrium. The paper provides three main results: a stability theorem under general noise (Thm 1), a convergence theorem to invariant sets of the limiting ODE (Thm 2), and a shadowing-based theorem ensuring convergence to a unique equilibrium within E_h (Thm 3). These theoretical guarantees underpin relative value iteration-based RL algorithms for solving average-reward MDPs and SMDPs and set the stage for their design and analysis in companion works (YWS25, WYS24).

Abstract

This paper investigates the stability and convergence properties of asynchronous stochastic approximation (SA) algorithms, with a focus on extensions relevant to average-reward reinforcement learning. We first extend a stability proof method of Borkar and Meyn to accommodate more general noise conditions than previously considered, thereby yielding broader convergence guarantees for asynchronous SA. To sharpen the convergence analysis, we further examine the shadowing properties of asynchronous SA, building on a dynamical systems approach of Hirsch and Benaïm. These results provide a theoretical foundation for a class of relative value iteration-based reinforcement learning algorithms -- developed and analyzed in a companion paper -- for solving average-reward Markov and semi-Markov decision processes.

Asynchronous Stochastic Approximation with Applications to Average-Reward Reinforcement Learning

TL;DR

This work addresses stability and convergence of asynchronous stochastic approximation in the context of average-reward reinforcement learning. It extends the Borkar–Meyn stability framework to general noise via stopping-time methods and proves convergence of asynchronous SA by linking iterates to ODE dynamics, then sharpens these results through a shadowing analysis based on Hirsch–Benaïm theory to identify conditions under which trajectories shadow a unique equilibrium. The paper provides three main results: a stability theorem under general noise (Thm 1), a convergence theorem to invariant sets of the limiting ODE (Thm 2), and a shadowing-based theorem ensuring convergence to a unique equilibrium within E_h (Thm 3). These theoretical guarantees underpin relative value iteration-based RL algorithms for solving average-reward MDPs and SMDPs and set the stage for their design and analysis in companion works (YWS25, WYS24).

Abstract

This paper investigates the stability and convergence properties of asynchronous stochastic approximation (SA) algorithms, with a focus on extensions relevant to average-reward reinforcement learning. We first extend a stability proof method of Borkar and Meyn to accommodate more general noise conditions than previously considered, thereby yielding broader convergence guarantees for asynchronous SA. To sharpen the convergence analysis, we further examine the shadowing properties of asynchronous SA, building on a dynamical systems approach of Hirsch and Benaïm. These results provide a theoretical foundation for a class of relative value iteration-based reinforcement learning algorithms -- developed and analyzed in a companion paper -- for solving average-reward Markov and semi-Markov decision processes.
Paper Structure (20 sections, 27 theorems, 119 equations)

This paper contains 20 sections, 27 theorems, 119 equations.

Key Result

Theorem 2.1

For algorithm (eq-alg0) under Assums. cond-h--cond-us, $\{x_n\}$ is bounded a.s.

Theorems & Definitions (60)

  • Remark 2.1
  • Remark 2.2
  • Remark 2.3
  • Theorem 2.1
  • Theorem 2.2
  • Lemma 2.1
  • proof
  • Corollary 2.1
  • Theorem 2.3
  • Remark 2.4: On the proof
  • ...and 50 more