Efficient Reinforcement Learning for Zero-Shot Coordination in Evolving Games
Bingyu Hui, Lebin Yu, Quanming Yao, Yunpeng Qu, Xudong Zhang, Jian Wang
TL;DR
This work tackles zero-shot coordination in evolving multi-agent environments by introducing ScaPT, a scalable population training framework. It combines a hierarchical meta-agent that shares parameters to simulate large populations with a mutual information-based diversity regularizer, enabling efficient and diverse population training. Instantiated in a value-based RL setting and evaluated on matrix games and Hanabi, ScaPT consistently outperforms prior methods, especially as population size grows under fixed computational budgets. The findings underscore the critical role of scalable population design and diversity-promoting objectives for robust zero-shot coordination in complex, evolving games.
Abstract
Zero-shot coordination(ZSC), a key challenge in multi-agent game theory, has become a hot topic in reinforcement learning (RL) research recently, especially in complex evolving games. It focuses on the generalization ability of agents, requiring them to coordinate well with collaborators from a diverse, potentially evolving, pool of partners that are not seen before without any fine-tuning. Population-based training, which approximates such an evolving partner pool, has been proven to provide good zero-shot coordination performance; nevertheless, existing methods are limited by computational resources, mainly focusing on optimizing diversity in small populations while neglecting the potential performance gains from scaling population size. To address this issue, this paper proposes the Scalable Population Training (ScaPT), an efficient RL training framework comprising two key components: a meta-agent that efficiently realizes a population by selectively sharing parameters across agents, and a mutual information regularizer that guarantees population diversity. To empirically validate the effectiveness of ScaPT, this paper evaluates it along with representational frameworks in Hanabi cooperative game and confirms its superiority.
