Table of Contents
Fetching ...

Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input

Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark

TL;DR

The paper demonstrates that cooperative reinforcement-learning agents can develop referential communication protocols from both symbolic, disentangled inputs and raw pixel data. Structured, attribute-based representations support more robust, compositional language, including generalization to novel objects and topographic alignment between meanings and signals. When trained on raw pixel input, agents still communicate above chance but produce more ad-hoc, less interpretable languages, with compositional structure contingent on the disentanglement of underlying factors of variation. Overall, the work scales emergent communication research to realistic perception, showing the critical role of environmental structure in shaping language emergence and grounding.

Abstract

The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by developing agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured.

Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input

TL;DR

The paper demonstrates that cooperative reinforcement-learning agents can develop referential communication protocols from both symbolic, disentangled inputs and raw pixel data. Structured, attribute-based representations support more robust, compositional language, including generalization to novel objects and topographic alignment between meanings and signals. When trained on raw pixel input, agents still communicate above chance but produce more ad-hoc, less interpretable languages, with compositional structure contingent on the disentanglement of underlying factors of variation. Overall, the work scales emergent communication research to realistic perception, showing the critical role of environmental structure in shaping language emergence and grounding.

Abstract

The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by developing agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured.

Paper Structure

This paper contains 18 sections, 1 equation, 4 figures, 5 tables.

Figures (4)

  • Figure 1: High-level overview of the referential game.
  • Figure 2: Training curves of different experimental setups with uniform and context-dependent target selection.
  • Figure 3: left: Three languages with different properties, taken from Brighton:Kirby:2006. The mapping between states and signals shown in (b) is random; there is no relationship between points in the meaning and signal space. In (c) and (d), similar meanings map to similar signals, i.e., there is a topographic relation between meanings and signals. right: Relation between objects' cosine similarity and their message Levenshtein distance for trained and random agents.
  • Figure 4: Target images and their associated messages from game A and game B.