Table of Contents
Fetching ...

Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

Haobin Jiang, Ziluo Ding, Zongqing Lu

TL;DR

The paper tackles decentralized multi-agent exploration under sparse rewards by enabling coordination through limited communication. It introduces MACE, which combines a global-novelty approximation (via the sum of local novelties) with a hindsight-based intrinsic reward derived from weighted mutual information between an agent's action and others' accumulated novelty, encouraging actions that influence others' exploration. The final shaped reward $r_s^i = r_ ext{ext} + r_ ext{nov}^i + \lambda r_ ext{hin}^i$ is optimized with independent PPO, and a scalable variant uses the sum of others' novelty to reduce cross-agent dependencies. Experiments across GridWorld, Overcooked, and SMAC demonstrate improved coordinated exploration and sample efficiency, with ablations confirming the necessity of both components. The approach offers a practical, low-communication-fee pathway to robust decentralized coordination in complex, sparse-reward domains.

Abstract

Exploration in decentralized cooperative multi-agent reinforcement learning faces two challenges. One is that the novelty of global states is unavailable, while the novelty of local observations is biased. The other is how agents can explore in a coordinated way. To address these challenges, we propose MACE, a simple yet effective multi-agent coordinated exploration method. By communicating only local novelty, agents can take into account other agents' local novelty to approximate the global novelty. Further, we newly introduce weighted mutual information to measure the influence of one agent's action on other agents' accumulated novelty. We convert it as an intrinsic reward in hindsight to encourage agents to exert more influence on other agents' exploration and boost coordinated exploration. Empirically, we show that MACE achieves superior performance in three multi-agent environments with sparse rewards.

Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

TL;DR

The paper tackles decentralized multi-agent exploration under sparse rewards by enabling coordination through limited communication. It introduces MACE, which combines a global-novelty approximation (via the sum of local novelties) with a hindsight-based intrinsic reward derived from weighted mutual information between an agent's action and others' accumulated novelty, encouraging actions that influence others' exploration. The final shaped reward is optimized with independent PPO, and a scalable variant uses the sum of others' novelty to reduce cross-agent dependencies. Experiments across GridWorld, Overcooked, and SMAC demonstrate improved coordinated exploration and sample efficiency, with ablations confirming the necessity of both components. The approach offers a practical, low-communication-fee pathway to robust decentralized coordination in complex, sparse-reward domains.

Abstract

Exploration in decentralized cooperative multi-agent reinforcement learning faces two challenges. One is that the novelty of global states is unavailable, while the novelty of local observations is biased. The other is how agents can explore in a coordinated way. To address these challenges, we propose MACE, a simple yet effective multi-agent coordinated exploration method. By communicating only local novelty, agents can take into account other agents' local novelty to approximate the global novelty. Further, we newly introduce weighted mutual information to measure the influence of one agent's action on other agents' accumulated novelty. We convert it as an intrinsic reward in hindsight to encourage agents to exert more influence on other agents' exploration and boost coordinated exploration. Empirically, we show that MACE achieves superior performance in three multi-agent environments with sparse rewards.
Paper Structure (30 sections, 16 equations, 15 figures, 6 tables, 1 algorithm)

This paper contains 30 sections, 16 equations, 15 figures, 6 tables, 1 algorithm.

Figures (15)

  • Figure 1: (a) Mutual information (MI) between action and reward in state 1 and state 2. (b) Weighted mutual information (WMI) between action and reward in state 1 and state 2.
  • Figure 2: GridWorld: (a) Pass. (b) SecretRoom. (c) MultiRoom.
  • Figure 3: Overcooked: (a) Base. (b) Narrow. (c) Large.
  • Figure 4: Learning curves of MACE compared with IPPO+r_loc, IPPO+r_nov, and IPPO+r_hin on three GridWorld tasks: (a) Pass, (b) SecretRoom, and (c) MultiRoom.
  • Figure 5: Learning curves of MACE compared with MACE-MI and MACE-Z on three GridWorld tasks: (a) Pass, (b) SecretRoom, and (c) MultiRoom.
  • ...and 10 more figures