Learning to Communicate Through Implicit Communication Channels
Han Wang, Binbin Chen, Tieying Zhang, Baoxiang Wang
TL;DR
The paper tackles communication in cooperative multi-agent systems where explicit channels are unavailable, by introducing the Implicit Channel Protocol (ICP). ICP encodes information into scouting actions via a centralized mapping, creating a broadcastable implicit channel and enabling end-to-end learning of both action policies and messaging with approaches like random and delayed information maps, hat mapping, and Gumbel-Softmax. It demonstrates significant performance gains across Guessing Numbers, Revealing Goals, and Hanabi, notably achieving 24.91/25 in Hanabi—near the theoretical maximum—and outperforming ToM-based and baseline methods. The work highlights reduced computational overhead compared to ToM reasoning, flexible compatibility with multiple training algorithms, and avenues for leveraging environmental information to further enhance implicit communication in MARL.
Abstract
Effective communication is an essential component in collaborative multi-agent systems. Situations where explicit messaging is not feasible have been common in human society throughout history, which motivate the study of implicit communication. Previous works on learning implicit communication mostly rely on theory of mind (ToM), where agents infer the mental states and intentions of others by interpreting their actions. However, ToM-based methods become less effective in making accurate inferences in complex tasks. In this work, we propose the Implicit Channel Protocol (ICP) framework, which allows agents to communicate through implicit communication channels similar to the explicit ones. ICP leverages a subset of actions, denoted as the scouting actions, and a mapping between information and these scouting actions that encodes and decodes the messages. We propose training algorithms for agents to message and act, including learning with a randomly initialized information map and with a delayed information map. The efficacy of ICP has been tested on the tasks of Guessing Numbers, Revealing Goals, and Hanabi, where ICP significantly outperforms baseline methods through more efficient information transmission.
