Selective Reincarnation: Offline-to-Online Multi-Agent Reinforcement Learning

Claude Formanek; Callum Rhys Tilbury; Jonathan Shock; Kale-ab Tessera; Arnu Pretorius

Selective Reincarnation: Offline-to-Online Multi-Agent Reinforcement Learning

Claude Formanek, Callum Rhys Tilbury, Jonathan Shock, Kale-ab Tessera, Arnu Pretorius

TL;DR

In the fully-cooperative MA setting with heterogeneous agents, it is demonstrated that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation.

Abstract

'Reincarnation' in reinforcement learning has been proposed as a formalisation of reusing prior computation from past experiments when training an agent in an environment. In this paper, we present a brief foray into the paradigm of reincarnation in the multi-agent (MA) context. We consider the case where only some agents are reincarnated, whereas the others are trained from scratch -- selective reincarnation. In the fully-cooperative MA setting with heterogeneous agents, we demonstrate that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation. However, the choice of which agents to reincarnate in a heterogeneous system is vitally important to the outcome of the training -- in fact, a poor choice can lead to considerably worse results than the alternatives. We argue that a rich field of work exists here, and we hope that our effort catalyses further energy in bringing the topic of reincarnation to the multi-agent realm.

Selective Reincarnation: Offline-to-Online Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure (13 sections, 3 equations, 7 figures, 1 table)

This paper contains 13 sections, 3 equations, 7 figures, 1 table.

Introduction
Preliminaries
Multi-Agent Reinforcement Learning
Independent Q-Learning
Related Work
Definitions
Case Study: Selectively-Reincarnated Policy-to-Value MARL
Impact of Teacher Dataset Quality
Arbitrarily Selective Reincarnation
Targeted Selective Reincarnation Matters
Roadmap for Multi-Agent Reincarnation
Conclusion
Appendix

Figures (7)

Figure 1: Performance using the two different teacher datasets. In the plot, a solid line indicates the mean value over the runs, and the shaded region indicates one standard error above and below the mean. In the table, values are given with one standard error.
Figure 2: Selective reincarnation performance, aggregated over the number of agents reincarnated. In the plot, a solid line indicates the mean value over the runs, and the shaded region indicates one standard error above and below the mean. In the table, values are given with one standard error. A reminder: take caution when comparing the standard error metrics across values of $x$, since the number of runs depends on ${{6}\choose{x}}$.
Figure 3: Training curves for the best and worst combinations of reincarnated agents, decided by the average episode return achieved. A solid line indicates the mean value over five seeds, and the shaded region indicates one standard error above and below the mean. In Figures \ref{['fig:1_agents']} to \ref{['fig:5_agents']}, the green and red lines indicate the maximum return achieved by the tabula rasa and fully-reincarnated approaches respectively.
Figure 4: MARL-evalgorsane2022emarlagarwal2021rliable plots comparing the best performing combination, based on final performance after $250k$ training steps, of $x$ reincarnated agents for each $x\in [0,n]$.
Figure A.1: The HalfCheetah environment wawrzynski2007learningmujoco_paper viewed from the perspective of six separate agents peng2021facmac. The array indices from the MAMuJoCo environment are given in brackets. Note that this diagram is purely illustrative and is not drawn with the correct relative scale.
...and 2 more figures

Theorems & Definitions (2)

Definition 1: Multi-Agent Reincarnation
Definition 2: Selective Reincarnation

Selective Reincarnation: Offline-to-Online Multi-Agent Reinforcement Learning

TL;DR

Abstract

Selective Reincarnation: Offline-to-Online Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (2)