A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays
Huizhen Yu, Yi Wan, Richard S. Sutton
TL;DR
This work addresses stability and convergence of asynchronous stochastic approximation in $\mathbb{R}^d$ without communication delays, by extending the Borkar–Meyn framework to accommodate more general noise. The authors develop a detailed ODE-based analysis using scaled iterates, time-rescaled trajectories, and scaled drift functions $h_c$ and $h_\infty$ to prove almost sure boundedness (stability) and convergence to invariant sets of $\dot{x}(t)=h(x(t))$, with refinements describing segment-wise behavior. The results yield robust convergence guarantees for average-reward reinforcement learning algorithms, such as average-reward Q-learning, under weakly communicating MDPs/SMDPs. The findings have practical significance for RL in distributed and asynchronous settings, providing a principled basis for stability and convergence in the presence of general noise structures; an alternative stability proof under stronger noise assumptions is provided in the appendix, and future work includes extending to noisy delayed/distributed updates.
Abstract
In this paper, we study asynchronous stochastic approximation algorithms without communication delays. Our main contribution is a stability proof for these algorithms that extends a method of Borkar and Meyn by accommodating more general noise conditions. We also derive convergence results from this stability result and discuss their application in important average-reward reinforcement learning problems.
