Table of Contents
Fetching ...

Emergent Multi-Agent Communication in the Deep Learning Era

Angeliki Lazaridou, Marco Baroni

TL;DR

Deep agent communities can develop language through interaction under reinforcement learning, with continuous and discrete channels enabling different learning dynamics. The paper surveys representative studies, methods to measure genuine communication, and the emergence of compositional structure under varied tasks and environments. It discusses how emergent language can improve inter-agent coordination, enable negotiations with self-interested agents, and facilitate human–machine collaboration, while highlighting risks of degenerate signals and language drift. It concludes with open questions and directions toward grounding emergent protocols in human language and leveraging pre-trained models for practical AI systems.

Abstract

The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they also can develop a shared language to interact. From a scientific perspective, understanding the conditions under which language evolves in communities of deep agents and its emergent features can shed light on human language evolution. From an applied perspective, endowing deep networks with the ability to solve problems interactively by communicating with each other and with us should make them more flexible and useful in everyday life. This article surveys representative recent language emergence studies from both of these two angles.

Emergent Multi-Agent Communication in the Deep Learning Era

TL;DR

Deep agent communities can develop language through interaction under reinforcement learning, with continuous and discrete channels enabling different learning dynamics. The paper surveys representative studies, methods to measure genuine communication, and the emergence of compositional structure under varied tasks and environments. It discusses how emergent language can improve inter-agent coordination, enable negotiations with self-interested agents, and facilitate human–machine collaboration, while highlighting risks of degenerate signals and language drift. It concludes with open questions and directions toward grounding emergent protocols in human language and leveraging pre-trained models for practical AI systems.

Abstract

The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they also can develop a shared language to interact. From a scientific perspective, understanding the conditions under which language evolves in communities of deep agents and its emergent features can shed light on human language evolution. From an applied perspective, endowing deep networks with the ability to solve problems interactively by communicating with each other and with us should make them more flexible and useful in everyday life. This article surveys representative recent language emergence studies from both of these two angles.

Paper Structure

This paper contains 12 sections, 4 figures.

Figures (4)

  • Figure 1: Examples of games and environments for emergent communication. (a) Emergent communication work in the pre-deep-learning era typically used symbolic data as input: ? (? ) presents a study where recurrent neural network agents communicate in a referential game using sequences of discrete symbols. Similar work with deep networks often uses realistic pictures as input, see Fig. \ref{['fig:referential-game']} for an example. (b) More complex scenarios with deep agents: ? (? ) study self-interested agents engaging in a multi-turn negotiation game. (c) Richer, dynamic environments: ? (? ) study five embodied self-interested agents engaging in multi-turn interactions while navigating in a 2D visual environment. (d) Scaling up to fully realistic scenarios: in the experiment of ? (? ), embodied cooperative agents solve navigation challenges in a 3D environment. Images from ? (? ) and ? (? ) reproduced by permission.
  • Figure 2: Typical neural network components of a deep agent. (a) A visual processing module (typically a convolutional network) converting pictures into internal distributed representations. (b) A generation component consisting of a recurrent neural network that produces a symbol sequence (in this case, $AXZ$). (c) An understanding module, that takes as input a sequence of units (in this case, the symbols produced by the generation component) and produces an internal distributed representation. A typical sender agent will first transform images into distributed representations with (a) and then use (b) to produce a message. A receiver agent will also use (a) to transform images to representations, and then (c) to process the message from the sender in order to make a decision about the output action. In both cases, further layers are interspersed with the various components to further aid the agents' "reasoning" process (e.g., the receiver might use them to combine visual and verbal information).
  • Figure 3: The referential game of ? (? ). In a referential game, successful communication is the very purpose of the game (as opposed to scenarios in which communication can help players to achieve an independent goal, such as obtaining a valuable object). Referential games have a long history in linguistics, philosophy and game theory Lewis:1969Skyrms:2010. In the game illustrated here, the sender network receives in input two natural images, depicting instances of two distinct categories out of about 500 (here: a dog and a car), with one of the images marked as target (here, the car). The sender processes the images with a convolutional network module and it emits one symbol (sampled from a fixed alphabet), that is given as input to the receiver network, together with the two images (in random order). If the receiver "points" to the correct location of the target in the image array (as it does in the figure), both agents are rewarded. The networks are trained by letting them play the game many times, and adjusting their weights based on the reward signal. No supervision is provided about the symbols to be used for communication, so that they are completely free to adapt the emergent protocol to their strategies and biases.
  • Figure 4: Training and test inputs in the referential game of Bouchacourt:Baroni:2018. Two agents were trained to play the game of ? (? ) (see Fig. \ref{['fig:referential-game']}). During training, the agents were exposed to the same data as in the original study, that is, pairs of pictures of instances of about 500 distinct objects (top row). At test time, however, the agents were made to play the game with blobs of Gaussian noise (bottom row). They were able to communicate about them nearly as well as about the training pictures. This shows that the language emerging in this game does not involve "words" referring to generic concepts, but rather ad-hoc signals, probably carrying comparative information about shallow visual properties of the images. Bottom row reproduced from ? (? ) by permission.