Finite-Time Analysis of Asynchronous Multi-Agent TD Learning

Nicolò Dal Fabbro; Arman Adibi; Aritra Mitra; George J. Pappas

Finite-Time Analysis of Asynchronous Multi-Agent TD Learning

Nicolò Dal Fabbro, Arman Adibi, Aritra Mitra, George J. Pappas

TL;DR

This paper considers a policy evaluation problem in which multiple agents cooperate to evaluate a common policy by communicating with a central aggregator, and studies the finite-time convergence of AsyncMATD, an asynchronous multi-agent temporal difference learning algorithm in which agents' local TD update directions are subject to asynchronous bounded delays.

Abstract

Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent reinforcement learning (MARL). In a setting involving $N$ agents, this beneficial effect usually comes in the form of an $N$-fold linear convergence speedup, i.e., a reduction - proportional to $N$ - in the number of iterations required to reach a certain convergence precision. In this paper, we show for the first time that this speedup property also holds for a MARL framework subject to asynchronous delays in the local agents' updates. In particular, we consider a policy evaluation problem in which multiple agents cooperate to evaluate a common policy by communicating with a central aggregator. In this setting, we study the finite-time convergence of \texttt{AsyncMATD}, an asynchronous multi-agent temporal difference (TD) learning algorithm in which agents' local TD update directions are subject to asynchronous bounded delays. Our main contribution is providing a finite-time analysis of \texttt{AsyncMATD}, for which we establish a linear convergence speedup while highlighting the effect of time-varying asynchronous delays on the resulting convergence rate.

Finite-Time Analysis of Asynchronous Multi-Agent TD Learning

TL;DR

Abstract

Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent reinforcement learning (MARL). In a setting involving

agents, this beneficial effect usually comes in the form of an

-fold linear convergence speedup, i.e., a reduction - proportional to

- in the number of iterations required to reach a certain convergence precision. In this paper, we show for the first time that this speedup property also holds for a MARL framework subject to asynchronous delays in the local agents' updates. In particular, we consider a policy evaluation problem in which multiple agents cooperate to evaluate a common policy by communicating with a central aggregator. In this setting, we study the finite-time convergence of \texttt{AsyncMATD}, an asynchronous multi-agent temporal difference (TD) learning algorithm in which agents' local TD update directions are subject to asynchronous bounded delays. Our main contribution is providing a finite-time analysis of \texttt{AsyncMATD}, for which we establish a linear convergence speedup while highlighting the effect of time-varying asynchronous delays on the resulting convergence rate.

Paper Structure (6 sections, 5 theorems, 75 equations, 2 figures)

This paper contains 6 sections, 5 theorems, 75 equations, 2 figures.

Introduction
System Model and Problem Formulation
Main result
Proof of the main result
Simulations
Conclusion and Future Work

Key Result

Theorem 1

Consider the update rule of AsyncMATD in (eq:updateRule). There exist universal constants $C_0, C_1, C_2, C_3 \geq 1$, such that, for $\alpha \leq \frac{\omega(1-\gamma)}{C_0(\tau + \tau_{max})}$ and $T \geq \tau + 2\tau_{max}$,

Figures (2)

Figure 1: System Model. Agents $1, \dots, N$ cooperatively learn a common policy interacting with replicas of the same MDP. At each iteration $k$, the server uses the available delayed update directions with delays $\tau_{1,k}, \dots, \tau_{N,k}$.
Figure 2: Comparison between vanilla MATD and AsyncMATD in single-agent ($N = 1$) and multi-agent ($N = 20$) settings. For AsyncMATD, we set $\tau_{max} = 100$.

Theorems & Definitions (8)

Definition 1
Theorem 1
Lemma 1
Lemma 2
Lemma 3
proof
Lemma 4
proof

Finite-Time Analysis of Asynchronous Multi-Agent TD Learning

TL;DR

Abstract

Finite-Time Analysis of Asynchronous Multi-Agent TD Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (8)