Table of Contents
Fetching ...

Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution

Ke Xue, Yutong Wang, Cong Guan, Lei Yuan, Haobo Fu, Qiang Fu, Chao Qian, Yang Yu

TL;DR

This work addresses zero-shot coordination with unseen, heterogeneous partners in cooperative MARL by introducing MAZE, a coevolutionary framework that maintains two policy populations (agents and partners) and evolves them through pairing, updating, and selection while preserving diversity via an archive. The approach leverages a PPO-like update augmented with a diversity term and uses Jensen-Shannon Divergence across population policies to encourage varied behaviors, enabling robust coordination with novel partners. Empirical validation in Overcooked and FillInTheGrid across multiple heterogeneous layouts shows MAZE outperforms self-play and prior diversity-based baselines, and human-in-the-loop experiments corroborate improved real-human coordination. The work highlights heterogeneity as essential in ZSC and offers a scalable, modular framework that can be integrated with existing ZSC techniques and extended to broader heterogeneous multi-agent tasks.

Abstract

Generating agents that can achieve zero-shot coordination (ZSC) with unseen partners is a new challenge in cooperative multi-agent reinforcement learning (MARL). Recently, some studies have made progress in ZSC by exposing the agents to diverse partners during the training process. They usually involve self-play when training the partners, implicitly assuming that the tasks are homogeneous. However, many real-world tasks are heterogeneous, and hence previous methods may be inefficient. In this paper, we study the heterogeneous ZSC problem for the first time and propose a general method based on coevolution, which coevolves two populations of agents and partners through three sub-processes: pairing, updating and selection. Experimental results on various heterogeneous tasks highlight the necessity of considering the heterogeneous setting and demonstrate that our proposed method is a promising solution for heterogeneous ZSC tasks.

Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution

TL;DR

This work addresses zero-shot coordination with unseen, heterogeneous partners in cooperative MARL by introducing MAZE, a coevolutionary framework that maintains two policy populations (agents and partners) and evolves them through pairing, updating, and selection while preserving diversity via an archive. The approach leverages a PPO-like update augmented with a diversity term and uses Jensen-Shannon Divergence across population policies to encourage varied behaviors, enabling robust coordination with novel partners. Empirical validation in Overcooked and FillInTheGrid across multiple heterogeneous layouts shows MAZE outperforms self-play and prior diversity-based baselines, and human-in-the-loop experiments corroborate improved real-human coordination. The work highlights heterogeneity as essential in ZSC and offers a scalable, modular framework that can be integrated with existing ZSC techniques and extended to broader heterogeneous multi-agent tasks.

Abstract

Generating agents that can achieve zero-shot coordination (ZSC) with unseen partners is a new challenge in cooperative multi-agent reinforcement learning (MARL). Recently, some studies have made progress in ZSC by exposing the agents to diverse partners during the training process. They usually involve self-play when training the partners, implicitly assuming that the tasks are homogeneous. However, many real-world tasks are heterogeneous, and hence previous methods may be inefficient. In this paper, we study the heterogeneous ZSC problem for the first time and propose a general method based on coevolution, which coevolves two populations of agents and partners through three sub-processes: pairing, updating and selection. Experimental results on various heterogeneous tasks highlight the necessity of considering the heterogeneous setting and demonstrate that our proposed method is a promising solution for heterogeneous ZSC tasks.
Paper Structure (36 sections, 2 equations, 10 figures, 8 tables, 1 algorithm)

This paper contains 36 sections, 2 equations, 10 figures, 8 tables, 1 algorithm.

Figures (10)

  • Figure 1: An example illustration of Human-AI coordination with increasing heterogeneity.
  • Figure 2: Illustration of the MAZE method, where MAZE coevolves two populations of agents and partners through three sub-processes, i.e., pairing, updating and selection.
  • Figure 3: The AA layout of Overcooked environment . The Overcooked environment is a two-player common-payoff game, where players need to coordinate to cook and deliver soup.
  • Figure 4: Illustration of different layouts on Overcooked.
  • Figure 5: FillInTheGrid environments. (a) Grid-T layout. (b) Grid-L layout.
  • ...and 5 more figures