Table of Contents
Fetching ...

Selective Reincarnation: Offline-to-Online Multi-Agent Reinforcement Learning

Claude Formanek, Callum Rhys Tilbury, Jonathan Shock, Kale-ab Tessera, Arnu Pretorius

TL;DR

In the fully-cooperative MA setting with heterogeneous agents, it is demonstrated that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation.

Abstract

'Reincarnation' in reinforcement learning has been proposed as a formalisation of reusing prior computation from past experiments when training an agent in an environment. In this paper, we present a brief foray into the paradigm of reincarnation in the multi-agent (MA) context. We consider the case where only some agents are reincarnated, whereas the others are trained from scratch -- selective reincarnation. In the fully-cooperative MA setting with heterogeneous agents, we demonstrate that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation. However, the choice of which agents to reincarnate in a heterogeneous system is vitally important to the outcome of the training -- in fact, a poor choice can lead to considerably worse results than the alternatives. We argue that a rich field of work exists here, and we hope that our effort catalyses further energy in bringing the topic of reincarnation to the multi-agent realm.

Selective Reincarnation: Offline-to-Online Multi-Agent Reinforcement Learning

TL;DR

In the fully-cooperative MA setting with heterogeneous agents, it is demonstrated that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation.

Abstract

'Reincarnation' in reinforcement learning has been proposed as a formalisation of reusing prior computation from past experiments when training an agent in an environment. In this paper, we present a brief foray into the paradigm of reincarnation in the multi-agent (MA) context. We consider the case where only some agents are reincarnated, whereas the others are trained from scratch -- selective reincarnation. In the fully-cooperative MA setting with heterogeneous agents, we demonstrate that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation. However, the choice of which agents to reincarnate in a heterogeneous system is vitally important to the outcome of the training -- in fact, a poor choice can lead to considerably worse results than the alternatives. We argue that a rich field of work exists here, and we hope that our effort catalyses further energy in bringing the topic of reincarnation to the multi-agent realm.
Paper Structure (13 sections, 3 equations, 7 figures, 1 table)

This paper contains 13 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Performance using the two different teacher datasets. In the plot, a solid line indicates the mean value over the runs, and the shaded region indicates one standard error above and below the mean. In the table, values are given with one standard error.
  • Figure 2: Selective reincarnation performance, aggregated over the number of agents reincarnated. In the plot, a solid line indicates the mean value over the runs, and the shaded region indicates one standard error above and below the mean. In the table, values are given with one standard error. A reminder: take caution when comparing the standard error metrics across values of $x$, since the number of runs depends on ${{6}\choose{x}}$.
  • Figure 3: Training curves for the best and worst combinations of reincarnated agents, decided by the average episode return achieved. A solid line indicates the mean value over five seeds, and the shaded region indicates one standard error above and below the mean. In Figures \ref{['fig:1_agents']} to \ref{['fig:5_agents']}, the green and red lines indicate the maximum return achieved by the tabula rasa and fully-reincarnated approaches respectively.
  • Figure 4: MARL-evalgorsane2022emarlagarwal2021rliable plots comparing the best performing combination, based on final performance after $250k$ training steps, of $x$ reincarnated agents for each $x\in [0,n]$.
  • Figure A.1: The HalfCheetah environment wawrzynski2007learningmujoco_paper viewed from the perspective of six separate agents peng2021facmac. The array indices from the MAMuJoCo environment are given in brackets. Note that this diagram is purely illustrative and is not drawn with the correct relative scale.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 1: Multi-Agent Reincarnation
  • Definition 2: Selective Reincarnation