Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing
Haobin Jiang, Ziluo Ding, Zongqing Lu
TL;DR
The paper tackles decentralized multi-agent exploration under sparse rewards by enabling coordination through limited communication. It introduces MACE, which combines a global-novelty approximation (via the sum of local novelties) with a hindsight-based intrinsic reward derived from weighted mutual information between an agent's action and others' accumulated novelty, encouraging actions that influence others' exploration. The final shaped reward $r_s^i = r_ ext{ext} + r_ ext{nov}^i + \lambda r_ ext{hin}^i$ is optimized with independent PPO, and a scalable variant uses the sum of others' novelty to reduce cross-agent dependencies. Experiments across GridWorld, Overcooked, and SMAC demonstrate improved coordinated exploration and sample efficiency, with ablations confirming the necessity of both components. The approach offers a practical, low-communication-fee pathway to robust decentralized coordination in complex, sparse-reward domains.
Abstract
Exploration in decentralized cooperative multi-agent reinforcement learning faces two challenges. One is that the novelty of global states is unavailable, while the novelty of local observations is biased. The other is how agents can explore in a coordinated way. To address these challenges, we propose MACE, a simple yet effective multi-agent coordinated exploration method. By communicating only local novelty, agents can take into account other agents' local novelty to approximate the global novelty. Further, we newly introduce weighted mutual information to measure the influence of one agent's action on other agents' accumulated novelty. We convert it as an intrinsic reward in hindsight to encourage agents to exert more influence on other agents' exploration and boost coordinated exploration. Empirically, we show that MACE achieves superior performance in three multi-agent environments with sparse rewards.
