Table of Contents
Fetching ...

Searching for the Most Human-like Emergent Language

Brendon Boldt, David Mortensen

TL;DR

The study tackles the challenge of making emergent languages resemble human language by optimizing a signalling-game environment with XferBench, a transfer-based language-model measure. It demonstrates that Bayesian hyperparameter search can produce emergent languages that outperform existing corpora in deep transfer to human language, and it reveals a meaningful relation between entropy and transfer performance, including an entropy-based Pareto frontier. Key findings include that large vocabularies (around 10k tokens), increased model capacity, and longer, information-rich messages improve realism, and that entropy acts as both a driver and bound for transfer performance. The work provides practical hyperparameter recommendations and emphasizes entropy minimization as an emergent property, offering a principled path toward realistic synthetic language data for NLP pretraining and evaluation.

Abstract

In this paper, we design a signalling game-based emergent communication environment to generate state-of-the-art emergent languages in terms of similarity to human language. This is done with hyperparameter optimization, using XferBench as the objective function. XferBench quantifies the statistical similarity of emergent language to human language by measuring its suitability for deep transfer learning to human language. Additionally, we demonstrate the predictive power of entropy on the transfer learning performance of emergent language as well as corroborate previous results on the entropy-minimization properties of emergent communication systems. Finally, we report generalizations regarding what hyperparameters produce more realistic emergent languages, that is, ones which transfer better to human language.

Searching for the Most Human-like Emergent Language

TL;DR

The study tackles the challenge of making emergent languages resemble human language by optimizing a signalling-game environment with XferBench, a transfer-based language-model measure. It demonstrates that Bayesian hyperparameter search can produce emergent languages that outperform existing corpora in deep transfer to human language, and it reveals a meaningful relation between entropy and transfer performance, including an entropy-based Pareto frontier. Key findings include that large vocabularies (around 10k tokens), increased model capacity, and longer, information-rich messages improve realism, and that entropy acts as both a driver and bound for transfer performance. The work provides practical hyperparameter recommendations and emphasizes entropy minimization as an emergent property, offering a principled path toward realistic synthetic language data for NLP pretraining and evaluation.

Abstract

In this paper, we design a signalling game-based emergent communication environment to generate state-of-the-art emergent languages in terms of similarity to human language. This is done with hyperparameter optimization, using XferBench as the objective function. XferBench quantifies the statistical similarity of emergent language to human language by measuring its suitability for deep transfer learning to human language. Additionally, we demonstrate the predictive power of entropy on the transfer learning performance of emergent language as well as corroborate previous results on the entropy-minimization properties of emergent communication systems. Finally, we report generalizations regarding what hyperparameters produce more realistic emergent languages, that is, ones which transfer better to human language.

Paper Structure

This paper contains 48 sections, 2 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Hyperparameter search shows that emergent and human languages tend towards the Pareto frontier of minimizing entropy and minimizing XferBench score (lower is better) while non-emergent synthetic languages less reliably follow this trend. Dashed gray line represents a lower bound on entropy versus XferBench score.
  • Figure 2: Illustration of hyperparameter optimization with XferBench (adapted from boldt-mortensen-2024-xferbench (CC BY 4.0 License)).
  • Figure 3: Examples of different hyperparameter--objective relations observed in the various searches and hyperparameters. From left-to-right, we have: (a) a clear best value, (b) a clear trend outside the provided range, (c) a weak trend toward a particular value, and (d) no definite trend. The $y$-axis based on different "sizes" of XferBench-da normalized to similar scales.
  • Figure 4: Plot of XferBench scores on emergent and human languages. XB 1--3 are emergent language corpora derived from Search 4 and Entropy 1--3 from Search 6e.
  • Figure 5: Accuracy versus XferBench for Search 5r. Accuracy is measured as proportion of rounds for which the correct observation is ranked in the top-$1$ percentile among all distractors.
  • ...and 7 more figures