Table of Contents
Fetching ...

Unsupervised Translation of Emergent Communication

Ido Levy, Orr Paradise, Boaz Carmeli, Ron Meir, Shafi Goldwasser, Yonatan Belinkov

TL;DR

This paper tackles translating emergent communication (EC) developed by AI agents in referential games into natural language without parallel data. It proposes a two-phase approach: extract an EC corpus from agent interactions and apply unsupervised neural machine translation (UNMT) with an English linguistic prior from image captions, leveraging back-translation and denoising. The study systematically analyzes how task complexity, defined by semantic diversity of environments (Random, Category, Supercategory, Inter-category), affects EC translatability, introducing a benchmark of EC-specific translation metrics. The findings show UNMT can translate EC to English, with translatability varying by complexity and revealing a nuanced relationship between compositionality, entropy, and translation quality, offering a path toward interpretable AI emergent languages.

Abstract

Emergent Communication (EC) provides a unique window into the language systems that emerge autonomously when agents are trained to jointly achieve shared goals. However, it is difficult to interpret EC and evaluate its relationship with natural languages (NL). This study employs unsupervised neural machine translation (UNMT) techniques to decipher ECs formed during referential games with varying task complexities, influenced by the semantic diversity of the environment. Our findings demonstrate UNMT's potential to translate EC, illustrating that task complexity characterized by semantic diversity enhances EC translatability, while higher task complexity with constrained semantic variability exhibits pragmatic EC, which, although challenging to interpret, remains suitable for translation. This research marks the first attempt, to our knowledge, to translate EC without the aid of parallel data.

Unsupervised Translation of Emergent Communication

TL;DR

This paper tackles translating emergent communication (EC) developed by AI agents in referential games into natural language without parallel data. It proposes a two-phase approach: extract an EC corpus from agent interactions and apply unsupervised neural machine translation (UNMT) with an English linguistic prior from image captions, leveraging back-translation and denoising. The study systematically analyzes how task complexity, defined by semantic diversity of environments (Random, Category, Supercategory, Inter-category), affects EC translatability, introducing a benchmark of EC-specific translation metrics. The findings show UNMT can translate EC to English, with translatability varying by complexity and revealing a nuanced relationship between compositionality, entropy, and translation quality, offering a path toward interpretable AI emergent languages.

Abstract

Emergent Communication (EC) provides a unique window into the language systems that emerge autonomously when agents are trained to jointly achieve shared goals. However, it is difficult to interpret EC and evaluate its relationship with natural languages (NL). This study employs unsupervised neural machine translation (UNMT) techniques to decipher ECs formed during referential games with varying task complexities, influenced by the semantic diversity of the environment. Our findings demonstrate UNMT's potential to translate EC, illustrating that task complexity characterized by semantic diversity enhances EC translatability, while higher task complexity with constrained semantic variability exhibits pragmatic EC, which, although challenging to interpret, remains suitable for translation. This research marks the first attempt, to our knowledge, to translate EC without the aid of parallel data.

Paper Structure

This paper contains 36 sections, 8 figures, 8 tables.

Figures (8)

  • Figure 1: (a) Illustration of the referential game setup. The Sender observes an image and sends a message to the Receiver, who must identify the correct image from a set of candidates based on the message received. The exchanged messages are recorded to create the EC corpus. (b) Using the monolingual EC corpus and a monolingual English caption corpus to train the UNMT system. (c) The UNMT translating an EC message into English.
  • Figure 2: Illustration of various levels of game complexity in referential games. The target image, a giraffe, is shown alongside different levels of distractors: a red fire hydrant representing an random game; another giraffe for category discrimination; cow in a lush field for supercategory discrimination; and a zebra alongside a giraffe and a person wavesurfing for a Inter-category game, where overlapping concepts (giraffes) illustrate the images' inherent complexity in a multi-category setting. Each column exemplifies the escalation of game complexity and the corresponding increase in potential target and distractor candidates.
  • Figure 3: Selected translation examples. Each panel shows an EC message, composed of six symbols followed by an EOS symbol, paired with its corresponding translation beneath. The translations capture nuanced details from the visuals, producing coherent and contextually appropriate text. Notably, in panel c), the MT model extends its response beyond the visible elements in the image, by adding "and a mouse", suggesting a tendency to "hallucinate" details potentially influenced by prior context knowledge and previous training on similar caption lengths.
  • Figure 4: Correlation matrix for Inter-Category complexity across different seeds. The near-zero correlation between seeds indicates that translation model performs uniquely on the same examples. Additionally, the strong correlation between exact match metrics, which positively correlate with the semantic metric within each seed, highlights the significant text-image alignment achieved by our translations.
  • Figure 5: Bipartite network depicting the relationships between EC compositionality metrics and translatability metrics. Edge thickness is proportional to the absolute Pearson correlation (blue for positive, red for negative).
  • ...and 3 more figures