Table of Contents
Fetching ...

Learning Multiagent Communication with Backpropagation

Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

TL;DR

The paper introduces CommNet, a differentiable framework enabling continuous communication among co-operating agents under partial observability and dynamic team sizes. By learning both control and inter-agent messaging through backpropagation, the model achieves improved coordination across diverse tasks, including lever-pulling, traffic management, combat, and reasoning challenges like bAbI. Key contributions include a permutation-invariant broadcast-communication mechanism, several architectural extensions (local connectivity, skip connections, temporal recurrence), and empirical evidence that learned communication can be sparse yet meaningful. The results indicate practical benefits for multi-agent systems and offer insights into interpretable communication strategies, with future work aimed at heterogenous agents and scaling to larger teams.

Abstract

Many tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their policy. We apply this model to a diverse set of tasks, demonstrating the ability of the agents to learn to communicate amongst themselves, yielding improved performance over non-communicative agents and baselines. In some cases, it is possible to interpret the language devised by the agents, revealing simple but effective strategies for solving the task at hand.

Learning Multiagent Communication with Backpropagation

TL;DR

The paper introduces CommNet, a differentiable framework enabling continuous communication among co-operating agents under partial observability and dynamic team sizes. By learning both control and inter-agent messaging through backpropagation, the model achieves improved coordination across diverse tasks, including lever-pulling, traffic management, combat, and reasoning challenges like bAbI. Key contributions include a permutation-invariant broadcast-communication mechanism, several architectural extensions (local connectivity, skip connections, temporal recurrence), and empirical evidence that learned communication can be sparse yet meaningful. The results indicate practical benefits for multi-agent systems and offer insights into interpretable communication strategies, with future work aimed at heterogenous agents and scaling to larger teams.

Abstract

Many tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their policy. We apply this model to a diverse set of tasks, demonstrating the ability of the agents to learn to communicate amongst themselves, yielding improved performance over non-communicative agents and baselines. In some cases, it is possible to interpret the language devised by the agents, revealing simple but effective strategies for solving the task at hand.

Paper Structure

This paper contains 18 sections, 11 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: An overview of our CommNet model. Left: view of module $f^i$ for a single agent $j$. Note that the parameters are shared across all agents. Middle: a single communication step, where each agents modules propagate their internal state $h$, as well as broadcasting a communication vector $c$ on a common channel (shown in red). Right: full model $\Phi$, showing input states $s$ for each agent, two communication steps and the output actions for each agent.
  • Figure 2: Left: Traffic junction task where agent-controlled cars (colored circles) have to pass the through the junction without colliding. Middle: The combat task, where model controlled agents (red circles) fight against enemy bots (blue circles). In both tasks each agent has limited visibility (orange region), thus is not able to see the location of all other agents. Right: As visibility in the environment decreases, the importance of communication grows in the traffic junction task.
  • Figure 3: Left: First two principal components of communication vectors $\tilde{c}$ from multiple runs on the traffic junction task Fig. \ref{['fig:junction']}(left). While the majority are "silent" (i.e. have a small norm), distinct clusters are also present. Middle: for three of these clusters, we probe the model to understand their meaning (see text for details). Right: First two principal components of hidden state vectors $h$ from the same runs as on the left, with corresponding color coding. Note how many of the "silent" communication vectors accompany non-zero hidden state vectors. This shows that the two pathways carry different information.
  • Figure 4: 3D PCA plot of hidden states of agents
  • Figure 5: A harder version of traffic task with four connected junctions.
  • ...and 1 more figures