Table of Contents
Fetching ...

Learning to Communicate Through Implicit Communication Channels

Han Wang, Binbin Chen, Tieying Zhang, Baoxiang Wang

TL;DR

The paper tackles communication in cooperative multi-agent systems where explicit channels are unavailable, by introducing the Implicit Channel Protocol (ICP). ICP encodes information into scouting actions via a centralized mapping, creating a broadcastable implicit channel and enabling end-to-end learning of both action policies and messaging with approaches like random and delayed information maps, hat mapping, and Gumbel-Softmax. It demonstrates significant performance gains across Guessing Numbers, Revealing Goals, and Hanabi, notably achieving 24.91/25 in Hanabi—near the theoretical maximum—and outperforming ToM-based and baseline methods. The work highlights reduced computational overhead compared to ToM reasoning, flexible compatibility with multiple training algorithms, and avenues for leveraging environmental information to further enhance implicit communication in MARL.

Abstract

Effective communication is an essential component in collaborative multi-agent systems. Situations where explicit messaging is not feasible have been common in human society throughout history, which motivate the study of implicit communication. Previous works on learning implicit communication mostly rely on theory of mind (ToM), where agents infer the mental states and intentions of others by interpreting their actions. However, ToM-based methods become less effective in making accurate inferences in complex tasks. In this work, we propose the Implicit Channel Protocol (ICP) framework, which allows agents to communicate through implicit communication channels similar to the explicit ones. ICP leverages a subset of actions, denoted as the scouting actions, and a mapping between information and these scouting actions that encodes and decodes the messages. We propose training algorithms for agents to message and act, including learning with a randomly initialized information map and with a delayed information map. The efficacy of ICP has been tested on the tasks of Guessing Numbers, Revealing Goals, and Hanabi, where ICP significantly outperforms baseline methods through more efficient information transmission.

Learning to Communicate Through Implicit Communication Channels

TL;DR

The paper tackles communication in cooperative multi-agent systems where explicit channels are unavailable, by introducing the Implicit Channel Protocol (ICP). ICP encodes information into scouting actions via a centralized mapping, creating a broadcastable implicit channel and enabling end-to-end learning of both action policies and messaging with approaches like random and delayed information maps, hat mapping, and Gumbel-Softmax. It demonstrates significant performance gains across Guessing Numbers, Revealing Goals, and Hanabi, notably achieving 24.91/25 in Hanabi—near the theoretical maximum—and outperforming ToM-based and baseline methods. The work highlights reduced computational overhead compared to ToM reasoning, flexible compatibility with multiple training algorithms, and avenues for leveraging environmental information to further enhance implicit communication in MARL.

Abstract

Effective communication is an essential component in collaborative multi-agent systems. Situations where explicit messaging is not feasible have been common in human society throughout history, which motivate the study of implicit communication. Previous works on learning implicit communication mostly rely on theory of mind (ToM), where agents infer the mental states and intentions of others by interpreting their actions. However, ToM-based methods become less effective in making accurate inferences in complex tasks. In this work, we propose the Implicit Channel Protocol (ICP) framework, which allows agents to communicate through implicit communication channels similar to the explicit ones. ICP leverages a subset of actions, denoted as the scouting actions, and a mapping between information and these scouting actions that encodes and decodes the messages. We propose training algorithms for agents to message and act, including learning with a randomly initialized information map and with a delayed information map. The efficacy of ICP has been tested on the tasks of Guessing Numbers, Revealing Goals, and Hanabi, where ICP significantly outperforms baseline methods through more efficient information transmission.

Paper Structure

This paper contains 27 sections, 1 equation, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Left: Guessing Numbers Environment . Agents cannot see their own digits, but can reveal others' segments' states by collaboratively giving hints. They can deduce their own digit and obtain shared rewards. Right: Revealing Goals Environment. The agents are positioned in a random spots in a grid world and are assigned a unique target. However, they can only observe others' targets which do not include themselves. By revealing each other's targets in the nearby grid, they can eventually find their own targets and reach them.
  • Figure 2: (a): The training curves of Guessing Numbers over a total of 100k steps with $N=3, l=11$. (b): Average episode length running by different algorithms in Guessing Numbers.
  • Figure 3: (a): The training curves of Revealing Goals over total 100k train steps with $N=4, H=5, T=50$. (b): The training curves of 4-players Hanabi with on-policy algorithms take around 150 hours and off-policy algorithms take around 20 hours.
  • Figure 4: By shuffling the embedding of information into scouting actions, even if the information strategy stays the same, environment information will also change. After fine-tuning, the performance of implementation with the same information strategy but different embedding varies.