Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

Haobin Jiang; Ziluo Ding; Zongqing Lu

Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

Haobin Jiang, Ziluo Ding, Zongqing Lu

TL;DR

The paper tackles decentralized multi-agent exploration under sparse rewards by enabling coordination through limited communication. It introduces MACE, which combines a global-novelty approximation (via the sum of local novelties) with a hindsight-based intrinsic reward derived from weighted mutual information between an agent's action and others' accumulated novelty, encouraging actions that influence others' exploration. The final shaped reward $r_s^i = r_ ext{ext} + r_ ext{nov}^i + \lambda r_ ext{hin}^i$ is optimized with independent PPO, and a scalable variant uses the sum of others' novelty to reduce cross-agent dependencies. Experiments across GridWorld, Overcooked, and SMAC demonstrate improved coordinated exploration and sample efficiency, with ablations confirming the necessity of both components. The approach offers a practical, low-communication-fee pathway to robust decentralized coordination in complex, sparse-reward domains.

Abstract

Exploration in decentralized cooperative multi-agent reinforcement learning faces two challenges. One is that the novelty of global states is unavailable, while the novelty of local observations is biased. The other is how agents can explore in a coordinated way. To address these challenges, we propose MACE, a simple yet effective multi-agent coordinated exploration method. By communicating only local novelty, agents can take into account other agents' local novelty to approximate the global novelty. Further, we newly introduce weighted mutual information to measure the influence of one agent's action on other agents' accumulated novelty. We convert it as an intrinsic reward in hindsight to encourage agents to exert more influence on other agents' exploration and boost coordinated exploration. Empirically, we show that MACE achieves superior performance in three multi-agent environments with sparse rewards.

Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

TL;DR

is optimized with independent PPO, and a scalable variant uses the sum of others' novelty to reduce cross-agent dependencies. Experiments across GridWorld, Overcooked, and SMAC demonstrate improved coordinated exploration and sample efficiency, with ablations confirming the necessity of both components. The approach offers a practical, low-communication-fee pathway to robust decentralized coordination in complex, sparse-reward domains.

Abstract

Paper Structure (30 sections, 16 equations, 15 figures, 6 tables, 1 algorithm)

This paper contains 30 sections, 16 equations, 15 figures, 6 tables, 1 algorithm.

Introduction
Preliminary
Methodology
Approximation to Global Novelty
Influence on Other Agents’ Exploration
Intrinsic Reward in Hindsight
MACE
Related Work
Experiments
Setup
GridWorld
Overcooked
SMAC
Conclusion
Mutual Information in MARL
...and 15 more sections

Figures (15)

Figure 1: (a) Mutual information (MI) between action and reward in state 1 and state 2. (b) Weighted mutual information (WMI) between action and reward in state 1 and state 2.
Figure 2: GridWorld: (a) Pass. (b) SecretRoom. (c) MultiRoom.
Figure 3: Overcooked: (a) Base. (b) Narrow. (c) Large.
Figure 4: Learning curves of MACE compared with IPPO+r_loc, IPPO+r_nov, and IPPO+r_hin on three GridWorld tasks: (a) Pass, (b) SecretRoom, and (c) MultiRoom.
Figure 5: Learning curves of MACE compared with MACE-MI and MACE-Z on three GridWorld tasks: (a) Pass, (b) SecretRoom, and (c) MultiRoom.
...and 10 more figures

Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

TL;DR

Abstract

Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

Authors

TL;DR

Abstract

Table of Contents

Figures (15)