Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Matthias Gerstgrasser; Tom Danino; Sarah Keren

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Matthias Gerstgrasser, Tom Danino, Sarah Keren

TL;DR

This work presents a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training, which outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms.

Abstract

We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation of our algorithm is available at https://github.com/mgerstgrasser/super.

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure (36 sections, 5 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 36 sections, 5 equations, 11 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
SUPER: Selective Multi-Agent Prioritized Experience Relay
Experience Selection
Deterministic Quantile experience selection
Deterministic Gaussian experience selection
Stochastic weighted experience selection
Experiments and Results
Algorithm and control benchmarks
Environments
Experimental Setup
Performance Evaluation
Ablations
Bandwidth Sensitivity
...and 21 more sections

Figures (11)

Figure 1: Performance of SUPER-dueling-DDQN variants with target bandwidth 0.1 on all domains. For team settings, performance is the total reward of all agents in the sharing team; for all other domains performance is the total reward of all agents. Shaded areas indicate one standard deviation.
Figure 2: Performance of SUPER-dueling-DDQN variants and baselines on all three PettingZoo domains. For Pursuit, performance is the total mean episode reward from all agents. For Battle and Adversarial-Pursuit, performance is the total mean episode reward from all agents in the sharing team (blue team in Battle, prey team in Adversarial-Pursuit). Shaded areas indicate one standard deviation.
Figure 3: Performance of quantile SUPER vs share-all and uniform random experience sharing in Pursuit at 800k timesteps.
Figure 4: Performance of quantile SUPER with varying bandwidth in Pursuit at 1-2M timesteps.
Figure 5: Pursuit Environment
...and 6 more figures

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

TL;DR

Abstract

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)