Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning
Xinyu Liu, Zixuan Xie, Shangtong Zhang
TL;DR
The paper tackles the fundamental limitation of the Robbins-Siegmund theorem, which requires a summable zero-order term, by developing an extension that tolerates a non-summable but square-summable zero-order term under a mild increment-growth condition. It proves almost sure convergence to a bounded set and derives nonasymptotic convergence rates, high-probability concentration, and $L^p$ bounds, all within a framework that accommodates time-inhomogeneous Markov noise. The theory is then applied to stochastic approximation and reinforcement learning, culminating in the first complete convergence characterizations for linear Q-learning, including almost-sure rates, maximal concentration bounds with exponential tails, and $L^p$ rates. These results enable rigorous analyses of algorithms previously considered intractable under standard RS theory, with practical implications for RL stability and learning-rate design. The framework points to promising directions for extending to nonlinear function approximation and non-Markovian noise models, offering a structured route to quantify convergence behavior in complex stochastic algorithms.
Abstract
The Robbins-Siegmund theorem establishes the convergence of stochastic processes that are almost supermartingales and is foundational for analyzing a wide range of stochastic iterative algorithms in stochastic approximation and reinforcement learning (RL). However, its original form has a significant limitation as it requires the zero-order term to be summable. In many important RL applications, this summable condition, however, cannot be met. This limitation motivates us to extend the Robbins-Siegmund theorem for almost supermartingales where the zero-order term is not summable but only square summable. Particularly, we introduce a novel and mild assumption on the increments of the stochastic processes. This together with the square summable condition enables an almost sure convergence to a bounded set. Additionally, we further provide almost sure convergence rates, high probability concentration bounds, and $L^p$ convergence rates. We then apply the new results in stochastic approximation and RL. Notably, we obtain the first almost sure convergence rate, the first high probability concentration bound, and the first $L^p$ convergence rate for $Q$-learning with linear function approximation.
