Learning Translations: Emergent Communication Pretraining for Cooperative Language Acquisition

Dylan Cope; Peter McBurney

Learning Translations: Emergent Communication Pretraining for Cooperative Language Acquisition

Dylan Cope, Peter McBurney

TL;DR

This work proposes a novel AI challenge called a Cooperative Language Acquisition Problem (CLAP) in which the ZSC assumptions are relaxed by allowing a 'joiner' agent to learn from a dataset of interactions between agents in a target community.

Abstract

In Emergent Communication (EC) agents learn to communicate with one another, but the protocols that they develop are specialised to their training community. This observation led to research into Zero-Shot Coordination (ZSC) for learning communication strategies that are robust to agents not encountered during training. However, ZSC typically assumes that no prior data is available about the agents that will be encountered in the zero-shot setting. In many cases, this presents an unnecessarily hard problem and rules out communication via preestablished conventions. We propose a novel AI challenge called a Cooperative Language Acquisition Problem (CLAP) in which the ZSC assumptions are relaxed by allowing a 'joiner' agent to learn from a dataset of interactions between agents in a target community. We propose and compare two methods for solving CLAPs: Imitation Learning (IL), and Emergent Communication pretraining and Translation Learning (ECTL), in which an agent is trained in self-play with EC and then learns from the data to translate between the emergent protocol and the target community's protocol.

Learning Translations: Emergent Communication Pretraining for Cooperative Language Acquisition

TL;DR

Abstract

Paper Structure (23 sections, 4 equations, 12 figures, 1 table)

This paper contains 23 sections, 4 equations, 12 figures, 1 table.

Introduction
Background
Decentralised POMDPs
Emergent Communication
Imitation Learning
Cooperative Language Acquisition
Problem Definitions
Disentangling Environment-Level and Communicative Competencies
Methods for Constructing Joiners
Imitation Learning (IL)
Emergent Communcation Pretraining and Translation Learning (ECTL)
Experiments
Environments
Creating Target Communities
Ablating Target Community Agents
...and 8 more sections

Figures (12)

Figure 1: (a) Agents architecture diagram for two agents (top and bottom) with communication. (b) Illustration of the gridworld toy environment with two agents, depicted with blue and green circles on a 5x5 grid. The green and blue stars indicate the locations of each agents' respective goals. The cloud thought bubble depicts the world as the green agent observes it. Note, each agent does not see their own goal, and they are off-by-one square in their knowledge of each other's goal. (c) Illustration of the driving environment with two agents. The dark circle in the centre is the 'pit'; the region in which large negative penalties are given when agents enter. The stars indicate goal locations, which can spawn in one of eight locations (the unused locations indicated by the greyed-out stars). The continuous state space is indicated with the grid axes. The agents are represented by arrowheads indicating their current position and direction. Again, here agents do not observe their own goals and therefore need to communicate, but unlike the gridworld, they do have perfect knowledge of the other agent's goals.
Figure 2: Illustration of the problem of compounding errors for Imitation Learning. When IL agents exit the expert state distribution they are unable to recover.
Figure 3: Training architecture diagrams for the Imitation Learning (IL) and Emergent Communication pretraining and Translation Learning (ECTL) methods. The forward and backward problems for each method are solved with supervised learning. Circles are variables and trapezoids are functions. Dotted lines indicate that gradients are blocked from backpropagating along a path. For the ECTL diagrams the $enc$ and $\pi^c$ functions are learned during the emergent communication pretraining phase. The variables $o^s_t, o^r_t, m_t, a_t$ are the sender/receiver agent observations, the sender's message, and the receiver's action from the dataset of interactions collected from the target community.
Figure 4: Performance results from cases in which (a) the team is formed of a variety of agents, comparing ECTL to imitation learning (IL) with unbiased and biased data from the gridworld environment. And (b) the target community agents ablated in different ways. All results are from 500 evaluation episodes and the error bars show means within 95% confidence intervals.
Figure 5: Comparisons between IL and ECTL on the Driving Communication environment.
...and 7 more figures

Learning Translations: Emergent Communication Pretraining for Cooperative Language Acquisition

TL;DR

Abstract

Learning Translations: Emergent Communication Pretraining for Cooperative Language Acquisition

Authors

TL;DR

Abstract

Table of Contents

Figures (12)