Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning
Linjie Xu, Zichuan Liu, Alexander Dockhorn, Diego Perez-Liebana, Jinyu Wang, Lei Song, Jiang Bian
TL;DR
This work tackles the notorious sample inefficiency in multi-agent reinforcement learning by proposing a simple yet effective paradigm: increase the Replay Ratio ($RR$), i.e., perform multiple gradient updates per episode to better exploit collected data. The approach is demonstrated as generally beneficial across three widely-used MARL baselines (VDN, QMIX, QPLEX) on six StarCraft II SMAC tasks, with $RR$ values of 2 or 4 yielding faster convergence and higher final performance, while excessive $RR$ can cause overfitting in some tasks. The authors address potential plasticity loss via Dormant Neural Ratio ($DNR$) analysis and show that shared RNNs help maintain network plasticity, making resets unnecessary except under extreme $RR$. They also explore the computation-versus-data-budget trade-off and compare $RR$ against larger batch sizes and learning rates, finding $RR$ to be a more effective lever for improving sample efficiency in MARL. The work provides open-source code and suggests future directions like adaptive $RR$ and the exploration of higher $RR$ values.
Abstract
One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency, we look at the widely used episodic training mechanism. In each training step, tens of frames are collected, but only one gradient step is made. We argue that this episodic training could be a source of poor sample efficiency. To better exploit the data already collected, we propose to increase the frequency of the gradient updates per environment interaction (a.k.a. Replay Ratio or Update-To-Data ratio). To show its generality, we evaluate $3$ MARL methods on $6$ SMAC tasks. The empirical results validate that a higher replay ratio significantly improves the sample efficiency for MARL algorithms. The codes to reimplement the results presented in this paper are open-sourced at https://anonymous.4open.science/r/rr_for_MARL-0D83/.
