Table of Contents
Fetching ...

Multi-Agent Cooperation and the Emergence of (Natural) Language

Angeliki Lazaridou, Alexander Peysakhovich, Marco Baroni

TL;DR

The paper investigates how language can emerge from multi-agent cooperation in referential games, addressing limitations of passive, large-text pretraining for interactive AI. It introduces a simple framework with sender/receiver agents, a discrete communication bottleneck, and reinforcement learning, and tests on real-image datasets with two architectures. It shows that agents coordinate effectively and develop symbol meanings aligned with high-level semantic properties; environment manipulations and a supervised grounding step push semantics toward human-interpretable language, with partial human-grounded success evidenced in ReferItGame tasks. This work contributes to grounding emergent communication in real-world semantics and provides a path toward conversational agents that can collaborate with humans.

Abstract

The current mainstream approach to train natural language systems is to expose them to large amounts of text. This passive learning is problematic if we are interested in developing interactive machines, such as conversational agents. We propose a framework for language learning that relies on multi-agent communication. We study this learning in the context of referential games. In these games, a sender and a receiver see a pair of images. The sender is told one of them is the target and is allowed to send a message from a fixed, arbitrary vocabulary to the receiver. The receiver must rely on this message to identify the target. Thus, the agents develop their own language interactively out of the need to communicate. We show that two networks with simple configurations are able to learn to coordinate in the referential game. We further explore how to make changes to the game environment to cause the "word meanings" induced in the game to better reflect intuitive semantic properties of the images. In addition, we present a simple strategy for grounding the agents' code into natural language. Both of these are necessary steps towards developing machines that are able to communicate with humans productively.

Multi-Agent Cooperation and the Emergence of (Natural) Language

TL;DR

The paper investigates how language can emerge from multi-agent cooperation in referential games, addressing limitations of passive, large-text pretraining for interactive AI. It introduces a simple framework with sender/receiver agents, a discrete communication bottleneck, and reinforcement learning, and tests on real-image datasets with two architectures. It shows that agents coordinate effectively and develop symbol meanings aligned with high-level semantic properties; environment manipulations and a supervised grounding step push semantics toward human-interpretable language, with partial human-grounded success evidenced in ReferItGame tasks. This work contributes to grounding emergent communication in real-world semantics and provides a path toward conversational agents that can collaborate with humans.

Abstract

The current mainstream approach to train natural language systems is to expose them to large amounts of text. This passive learning is problematic if we are interested in developing interactive machines, such as conversational agents. We propose a framework for language learning that relies on multi-agent communication. We study this learning in the context of referential games. In these games, a sender and a receiver see a pair of images. The sender is told one of them is the target and is allowed to send a message from a fixed, arbitrary vocabulary to the receiver. The receiver must rely on this message to identify the target. Thus, the agents develop their own language interactively out of the need to communicate. We show that two networks with simple configurations are able to learn to coordinate in the referential game. We further explore how to make changes to the game environment to cause the "word meanings" induced in the game to better reflect intuitive semantic properties of the images. In addition, we present a simple strategy for grounding the agents' code into natural language. Both of these are necessary steps towards developing machines that are able to communicate with humans productively.

Paper Structure

This paper contains 10 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Architectures of agent players.
  • Figure 2: Left: Communication success as a function of training iterations, we see that informed senders converge faster than agnostic ones. Right: Spectrum of an example symbol usage matrix: the first few dimensions do capture only partial variance, suggesting that the usage of more symbols by the informed sender is not just due to synonymy.
  • Figure 3: t-SNE plots of object fc vectors color-coded by majority symbols assigned to them by informed sender. Object class names shown for a random subset. Left: configuration of 4th row of Table \ref{['tab:exp1_table']}. Right: 2nd row of Table \ref{['tab:exp2_multiinstance']}.
  • Figure 4: Example pairs from the ReferItGame set, with word produced by sender. Target images framed in green.