Table of Contents
Fetching ...

Emergent Language: A Survey and Taxonomy

Jannik Peters, Constantin Waubert de Puiseau, Hasan Tercan, Arya Gopikrishnan, Gustavo Adolpho Lucas De Carvalho, Christian Bitter, Tobias Meisen

TL;DR

This survey addresses how agents autonomously develop discrete emergent language to coordinate in multi-agent reinforcement learning. It introduces a comprehensive taxonomy linking communication settings, language games, priors, and linguistic characteristics to a unified suite of metrics, enabling consistent comparisons across studies. The work highlights grounding, compositionality, consistency, and generalization as central semantic facets, and pragmatics such as predictability, efficiency, and signaling dynamics to assess practical utility. By clarifying terminology and standardizing evaluation, the paper aims to advance EL research toward reproducible, human-aligned communication and more robust human–agent interaction.

Abstract

The field of emergent language represents a novel area of research within the domain of artificial intelligence, particularly within the context of multi-agent reinforcement learning. Although the concept of studying language emergence is not new, early approaches were primarily concerned with explaining human language formation, with little consideration given to its potential utility for artificial agents. In contrast, studies based on reinforcement learning aim to develop communicative capabilities in agents that are comparable to or even superior to human language. Thus, they extend beyond the learned statistical representations that are common in natural language processing research. This gives rise to a number of fundamental questions, from the prerequisites for language emergence to the criteria for measuring its success. This paper addresses these questions by providing a comprehensive review of 181 scientific publications on emergent language in artificial intelligence. Its objective is to serve as a reference for researchers interested in or proficient in the field. Consequently, the main contributions are the definition and overview of the prevailing terminology, the analysis of existing evaluation methods and metrics, and the description of the identified research gaps.

Emergent Language: A Survey and Taxonomy

TL;DR

This survey addresses how agents autonomously develop discrete emergent language to coordinate in multi-agent reinforcement learning. It introduces a comprehensive taxonomy linking communication settings, language games, priors, and linguistic characteristics to a unified suite of metrics, enabling consistent comparisons across studies. The work highlights grounding, compositionality, consistency, and generalization as central semantic facets, and pragmatics such as predictability, efficiency, and signaling dynamics to assess practical utility. By clarifying terminology and standardizing evaluation, the paper aims to advance EL research toward reproducible, human-aligned communication and more robust human–agent interaction.

Abstract

The field of emergent language represents a novel area of research within the domain of artificial intelligence, particularly within the context of multi-agent reinforcement learning. Although the concept of studying language emergence is not new, early approaches were primarily concerned with explaining human language formation, with little consideration given to its potential utility for artificial agents. In contrast, studies based on reinforcement learning aim to develop communicative capabilities in agents that are comparable to or even superior to human language. Thus, they extend beyond the learned statistical representations that are common in natural language processing research. This gives rise to a number of fundamental questions, from the prerequisites for language emergence to the criteria for measuring its success. This paper addresses these questions by providing a comprehensive review of 181 scientific publications on emergent language in artificial intelligence. Its objective is to serve as a reference for researchers interested in or proficient in the field. Consequently, the main contributions are the definition and overview of the prevailing terminology, the analysis of existing evaluation methods and metrics, and the description of the identified research gaps.
Paper Structure (91 sections, 67 equations, 13 figures, 11 tables)

This paper contains 91 sections, 67 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: The different forms of communication. They are divided by type of recipients and purpose. Intrapersonal communication encompasses self-centered communication like internal vocalization. The remaining forms of communication are directed externally and are utilized to transmit information to individuals, in the interpersonal setting, or groups of addressees. In group communication the participants usually have a common goal, whereas public communication focuses on the general transfer of information to a group of interested but not necessarily goal-aligned entities. Finally, mass communication is used to describe any form of communication that is directed towards a general audience and focuses availability, for example, through the use of various media, including the internet. Adapted from jones.2018.
  • Figure 2: Interpersonal communication. Actors are communicator A and B, each depicted by orange circles. They are each situated in their individual environment, depicted by the blue and green ellipses. At the overlap point the read arrow indicates the available communication channel. The potential environmental noise, influencing the communication, is represented by grey arrows going through the entire image. Adapted from adler.2012.
  • Figure 3: The semiotic cycle. This framework of a language-based exchange between two separate entities, called speaker and listener, categorizes the process into three levels. The sensori-motor level encompasses sensor and world-oriented components, the conceptual level includes internal and intangible parts like the individual world model and conceptualization capabilities, and the linguistic level consists of the production and comprehension of the linguistic exchange, which is the externalized connection between speaker and listener. Adapted from bleys.2015vaneecke.2020.
  • Figure 4: Major levels of linguistic structure. This conceptual structure depicts the elements of a language as concentric rings that build upon each other. From the inner circle, phonetics, through phonology, morphology, syntax, and semantics, to pragmatics. Adapted from ieeecomputersociety.2005.
  • Figure 5: Number of publications per year and by type. The number of publications per year is provided in the leftmost column, and the distribution of different publication types is shown in the remaining columns.
  • ...and 8 more figures