Table of Contents
Fetching ...

Multi-Agent Coordination via Multi-Level Communication

Ziluo Ding, Zeyuan Liu, Zhirui Fang, Kefan Su, Liwen Zhu, Zongqing Lu

TL;DR

Theoretically, it is proved the policies learned by SeqComm are guaranteed to improve monotonically and converge and it is shown that SeqComm outperforms existing methods in various cooperative multi-agent tasks.

Abstract

The partial observability and stochasticity in multi-agent settings can be mitigated by accessing more information about others via communication. However, the coordination problem still exists since agents cannot communicate actual actions with each other at the same time due to the circular dependencies. In this paper, we propose a novel multi-level communication scheme, Sequential Communication (SeqComm). SeqComm treats agents asynchronously (the upper-level agents make decisions before the lower-level ones) and has two communication phases. In the negotiation phase, agents determine the priority of decision-making by communicating hidden states of observations and comparing the value of intention, obtained by modeling the environment dynamics. In the launching phase, the upper-level agents take the lead in making decisions and then communicate their actions with the lower-level agents. Theoretically, we prove the policies learned by SeqComm are guaranteed to improve monotonically and converge. Empirically, we show that SeqComm outperforms existing methods in various cooperative multi-agent tasks.

Multi-Agent Coordination via Multi-Level Communication

TL;DR

Theoretically, it is proved the policies learned by SeqComm are guaranteed to improve monotonically and converge and it is shown that SeqComm outperforms existing methods in various cooperative multi-agent tasks.

Abstract

The partial observability and stochasticity in multi-agent settings can be mitigated by accessing more information about others via communication. However, the coordination problem still exists since agents cannot communicate actual actions with each other at the same time due to the circular dependencies. In this paper, we propose a novel multi-level communication scheme, Sequential Communication (SeqComm). SeqComm treats agents asynchronously (the upper-level agents make decisions before the lower-level ones) and has two communication phases. In the negotiation phase, agents determine the priority of decision-making by communicating hidden states of observations and comparing the value of intention, obtained by modeling the environment dynamics. In the launching phase, the upper-level agents take the lead in making decisions and then communicate their actions with the lower-level agents. Theoretically, we prove the policies learned by SeqComm are guaranteed to improve monotonically and converge. Empirically, we show that SeqComm outperforms existing methods in various cooperative multi-agent tasks.
Paper Structure (22 sections, 7 theorems, 21 equations, 15 figures)

This paper contains 22 sections, 7 theorems, 21 equations, 15 figures.

Key Result

Proposition 1

If all the agents update their policy with individual TRPO schulman2015trust sequentially in multi-agent sequential decision-making, then the joint policy of all agents are guaranteed to improve monotonically and converge.

Figures (15)

  • Figure 1: \ref{['fig:mat_payoff']} Payoff matrix for a one-step game. There are multiple local optima. \ref{['fig:mat_performance']} Evaluations of different methods for the game in terms of the mean reward and standard deviation of ten runs. $A \rightarrow B$, $B \rightarrow A$, Simultaneous, and Learned represent that agent $A$ makes decisions first, agent $B$ makes decisions first, two agents make decisions simultaneously, and there is another learned policy determining the priority of decision making, respectively. MAPPO yu2021surprising is used as the backbone.
  • Figure 2: Overview of SeqComm. SeqComm has two communication phases, the negotiation phase (left) and the launching phase (right). In the negotiation phase, agents communicate hidden states of observations with others and obtain their own intention. The priority of decision-making is determined by sharing and comparing the value of all the intentions. In the launching phase, the agents who hold the upper-level positions will make decisions prior to the lower-level agents. Besides, their actions will be shared with anyone that has not yet made decisions.
  • Figure 3: Architecture of SeqComm. The critic and policy of each agent take input as its own observation and received messages. The world model takes as input the joint hidden states and predicted joint actions.
  • Figure 4: Learning curves of SeqComm and baselines in nine SMACv2 maps.
  • Figure 5: Ablation studies of the communication ranges.
  • ...and 10 more figures

Theorems & Definitions (20)

  • Proposition 1
  • proof
  • Claim 1
  • Remark 1
  • Proposition 2
  • proof
  • Theorem 1
  • proof
  • Remark 2
  • Lemma 1: Agent-by-Agent PPO
  • ...and 10 more