Table of Contents
Fetching ...

Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination

Liangzhou Wang, Kaiwen Zhu, Fengming Zhu, Xinghu Yao, Shujie Zhang, Deheng Ye, Haobo Fu, Qiang Fu, Wei Yang

TL;DR

The proposed Multi-agent Goal Imagination (MAGI) framework guides agents to reach consensus with an Imagined common goal by directly model this distribution with a self-supervised generative model, thus alleviating the "curse of dimensinality" problem induced by multi-agent multi-step policy rollout commonly used in model-based methods.

Abstract

Reaching consensus is key to multi-agent coordination. To accomplish a cooperative task, agents need to coherently select optimal joint actions to maximize the team reward. However, current cooperative multi-agent reinforcement learning (MARL) methods usually do not explicitly take consensus into consideration, which may cause miscoordination problem. In this paper, we propose a model-based consensus mechanism to explicitly coordinate multiple agents. The proposed Multi-agent Goal Imagination (MAGI) framework guides agents to reach consensus with an Imagined common goal. The common goal is an achievable state with high value, which is obtained by sampling from the distribution of future states. We directly model this distribution with a self-supervised generative model, thus alleviating the "curse of dimensinality" problem induced by multi-agent multi-step policy rollout commonly used in model-based methods. We show that such efficient consensus mechanism can guide all agents cooperatively reaching valuable future states. Results on Multi-agent Particle-Environments and Google Research Football environment demonstrate the superiority of MAGI in both sample efficiency and performance.

Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination

TL;DR

The proposed Multi-agent Goal Imagination (MAGI) framework guides agents to reach consensus with an Imagined common goal by directly model this distribution with a self-supervised generative model, thus alleviating the "curse of dimensinality" problem induced by multi-agent multi-step policy rollout commonly used in model-based methods.

Abstract

Reaching consensus is key to multi-agent coordination. To accomplish a cooperative task, agents need to coherently select optimal joint actions to maximize the team reward. However, current cooperative multi-agent reinforcement learning (MARL) methods usually do not explicitly take consensus into consideration, which may cause miscoordination problem. In this paper, we propose a model-based consensus mechanism to explicitly coordinate multiple agents. The proposed Multi-agent Goal Imagination (MAGI) framework guides agents to reach consensus with an Imagined common goal. The common goal is an achievable state with high value, which is obtained by sampling from the distribution of future states. We directly model this distribution with a self-supervised generative model, thus alleviating the "curse of dimensinality" problem induced by multi-agent multi-step policy rollout commonly used in model-based methods. We show that such efficient consensus mechanism can guide all agents cooperatively reaching valuable future states. Results on Multi-agent Particle-Environments and Google Research Football environment demonstrate the superiority of MAGI in both sample efficiency and performance.
Paper Structure (18 sections, 9 equations, 8 figures, 2 tables)

This paper contains 18 sections, 9 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of MAGI. (a) The multi-agent goal imagination module. CVAE models the future state distribution, from which the goal actor and goal critic sample the common goal. (b) Agent network structure coordinated by imagined goal-based consensus mechanism with intrinsic reward and hypernetwork policy. (c) Policy network with goal-based hypernetwork.
  • Figure 2: Illustrations of the Multi-agent Particle-Environments.
  • Figure 3: The average results in four MPEs tasks. MAGI outperforms all other methods in sample efficiency and performance. The legend in (a) applies across all plots.
  • Figure 4: Demonstration of the GRF environment and average goal difference results in three GRF scenarios.
  • Figure 5: Demonstration of scalability. MAGI improves performance in Ten-Agents Treasure Collection task.
  • ...and 3 more figures