Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

Arman Adibi; Nicolo Dal Fabbro; Luca Schenato; Sanjeev Kulkarni; H. Vincent Poor; George J. Pappas; Hamed Hassani; Aritra Mitra

Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

Arman Adibi, Nicolo Dal Fabbro, Luca Schenato, Sanjeev Kulkarni, H. Vincent Poor, George J. Pappas, Hamed Hassani, Aritra Mitra

TL;DR

The theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms, including TD learning, Q-learning, and stochastic gradient descent under Markovian sampling.

Abstract

Motivated by applications in large-scale and multi-agent reinforcement learning, we study the non-asymptotic performance of stochastic approximation (SA) schemes with delayed updates under Markovian sampling. While the effect of delays has been extensively studied for optimization, the manner in which they interact with the underlying Markov process to shape the finite-time performance of SA remains poorly understood. In this context, our first main contribution is to show that under time-varying bounded delays, the delayed SA update rule guarantees exponentially fast convergence of the \emph{last iterate} to a ball around the SA operator's fixed point. Notably, our bound is \emph{tight} in its dependence on both the maximum delay $τ_{max}$, and the mixing time $τ_{mix}$. To achieve this tight bound, we develop a novel inductive proof technique that, unlike various existing delayed-optimization analyses, relies on establishing uniform boundedness of the iterates. As such, our proof may be of independent interest. Next, to mitigate the impact of the maximum delay on the convergence rate, we provide the first finite-time analysis of a delay-adaptive SA scheme under Markovian sampling. In particular, we show that the exponent of convergence of this scheme gets scaled down by $τ_{avg}$, as opposed to $τ_{max}$ for the vanilla delayed SA rule; here, $τ_{avg}$ denotes the average delay across all iterations. Moreover, the adaptive scheme requires no prior knowledge of the delay sequence for step-size tuning. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms, including TD learning, Q-learning, and stochastic gradient descent under Markovian sampling.

Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

TL;DR

The theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms, including TD learning, Q-learning, and stochastic gradient descent under Markovian sampling.

Abstract

, and the mixing time

. To achieve this tight bound, we develop a novel inductive proof technique that, unlike various existing delayed-optimization analyses, relies on establishing uniform boundedness of the iterates. As such, our proof may be of independent interest. Next, to mitigate the impact of the maximum delay on the convergence rate, we provide the first finite-time analysis of a delay-adaptive SA scheme under Markovian sampling. In particular, we show that the exponent of convergence of this scheme gets scaled down by

, as opposed to

for the vanilla delayed SA rule; here,

denotes the average delay across all iterations. Moreover, the adaptive scheme requires no prior knowledge of the delay sequence for step-size tuning. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms, including TD learning, Q-learning, and stochastic gradient descent under Markovian sampling.

Paper Structure (33 sections, 14 theorems, 260 equations, 1 figure, 1 table)

This paper contains 33 sections, 14 theorems, 260 equations, 1 figure, 1 table.

INTRODUCTION
Related Works
Contributions
PROBLEM FORMULATION
SA under Markovian Sampling
Exemplar Applications
SA with delayed updates
Assumptions and Definitions
WARM UP: STOCHASTIC APPROXIMATION WITH CONSTANT DELAYS
STOCHASTIC APPROXIMATION WITH TIME-VARYING DELAYS
Overview of Our Proof Technique
Auxiliary Lemmas
DELAY-ADAPTIVE STOCHASTIC APPROXIMATION
CONCLUSIONS AND FUTURE WORK
Related Work
...and 18 more sections

Key Result

Theorem 1

Suppose Assumptions 1-3 hold. Let $w_t \triangleq (1 - 0.5\alpha\mu)^{-(t+1)}$ and $W_T = \sum_{t=0}^{T} w_t$. Let ${\boldsymbol{\theta}}_{out}$ be an iterate chosen randomly from $\{{\boldsymbol{\theta}}_t\}_{t=0}^{T}$, such that ${\boldsymbol{\theta}}_{out} = {\boldsymbol{\theta}}_t$ with probabil with $C_{\alpha}= O\left(\frac{1}{\alpha\mu} + \frac{\bar{\tau}\sigma^2}{\mu}\right)$. Setting $\al

Figures (1)

Figure 1: Simulation performance of a TD(0) learning algorithm under delayed updates and Markovian sampling. We compare three different algorithms: a non-delayed TD learning algorithm, a vanilla-delayed algorithm (equivalent to update rule \ref{['eq:delayedSA']}), and a delay-adaptive algorithm (equivalent to update rule \ref{['eq:algo']}).

Theorems & Definitions (19)

Remark 1
Definition 1
Remark 2
Theorem 1
Theorem 2
Lemma 1
Lemma 2
Lemma 3
Theorem 3
Lemma 4
...and 9 more

Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

TL;DR

Abstract

Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (19)