Multi-Agent Cooperation and the Emergence of (Natural) Language
Angeliki Lazaridou, Alexander Peysakhovich, Marco Baroni
TL;DR
The paper investigates how language can emerge from multi-agent cooperation in referential games, addressing limitations of passive, large-text pretraining for interactive AI. It introduces a simple framework with sender/receiver agents, a discrete communication bottleneck, and reinforcement learning, and tests on real-image datasets with two architectures. It shows that agents coordinate effectively and develop symbol meanings aligned with high-level semantic properties; environment manipulations and a supervised grounding step push semantics toward human-interpretable language, with partial human-grounded success evidenced in ReferItGame tasks. This work contributes to grounding emergent communication in real-world semantics and provides a path toward conversational agents that can collaborate with humans.
Abstract
The current mainstream approach to train natural language systems is to expose them to large amounts of text. This passive learning is problematic if we are interested in developing interactive machines, such as conversational agents. We propose a framework for language learning that relies on multi-agent communication. We study this learning in the context of referential games. In these games, a sender and a receiver see a pair of images. The sender is told one of them is the target and is allowed to send a message from a fixed, arbitrary vocabulary to the receiver. The receiver must rely on this message to identify the target. Thus, the agents develop their own language interactively out of the need to communicate. We show that two networks with simple configurations are able to learn to coordinate in the referential game. We further explore how to make changes to the game environment to cause the "word meanings" induced in the game to better reflect intuitive semantic properties of the images. In addition, we present a simple strategy for grounding the agents' code into natural language. Both of these are necessary steps towards developing machines that are able to communicate with humans productively.
