Table of Contents
Fetching ...

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Matthias Gerstgrasser, Tom Danino, Sarah Keren

TL;DR

This work presents a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training, which outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms.

Abstract

We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation of our algorithm is available at https://github.com/mgerstgrasser/super.

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

TL;DR

This work presents a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training, which outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms.

Abstract

We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation of our algorithm is available at https://github.com/mgerstgrasser/super.
Paper Structure (36 sections, 5 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 36 sections, 5 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: Performance of SUPER-dueling-DDQN variants with target bandwidth 0.1 on all domains. For team settings, performance is the total reward of all agents in the sharing team; for all other domains performance is the total reward of all agents. Shaded areas indicate one standard deviation.
  • Figure 2: Performance of SUPER-dueling-DDQN variants and baselines on all three PettingZoo domains. For Pursuit, performance is the total mean episode reward from all agents. For Battle and Adversarial-Pursuit, performance is the total mean episode reward from all agents in the sharing team (blue team in Battle, prey team in Adversarial-Pursuit). Shaded areas indicate one standard deviation.
  • Figure 3: Performance of quantile SUPER vs share-all and uniform random experience sharing in Pursuit at 800k timesteps.
  • Figure 4: Performance of quantile SUPER with varying bandwidth in Pursuit at 1-2M timesteps.
  • Figure 5: Pursuit Environment
  • ...and 6 more figures