Table of Contents
Fetching ...

The Five Ws of Multi-Agent Communication: Who Talks to Whom, When, What, and Why -- A Survey from MARL to Emergent Language and LLMs

Jingdi Chen, Hanqing Yang, Zongjun Liu, Carlee Joe-Wong

TL;DR

This survey unifies three MA-Comm strands—MARL-based learned communication, Emergent Language, and LLM-powered coordination—under the Five Ws framework: who communicates with whom, what is shared, when communication occurs, why it helps, and how it is implemented. It traces the evolution from hand-designed MARL protocols to end-to-end learned messaging, discrete emergent languages, and language-grounded LLM systems, and it articulates bridging patterns across paradigms. The work highlights practical design patterns, cross-paradigm gaps, and open challenges—grounding, interpretability, scalability, and theoretical guarantees—to guide future hybrid systems that couple learning, language, and control. By providing a cross-cutting taxonomy, bridge sections, and forward-looking open problems, the paper aims to catalyze robust, scalable, and human-aligned multi-agent communication in real-world settings.

Abstract

Multi-agent sequential decision-making powers many real-world systems, from autonomous vehicles and robotics to collaborative AI assistants. In dynamic, partially observable environments, communication is often what reduces uncertainty and makes collaboration possible. This survey reviews multi-agent communication (MA-Comm) through the Five Ws: who communicates with whom, what is communicated, when communication occurs, and why communication is beneficial. This framing offers a clean way to connect ideas across otherwise separate research threads. We trace how communication approaches have evolved across three major paradigms. In Multi-Agent Reinforcement Learning (MARL), early methods used hand-designed or implicit protocols, followed by end-to-end learned communication optimized for reward and control. While successful, these protocols are frequently task-specific and hard to interpret, motivating work on Emergent Language (EL), where agents can develop more structured or symbolic communication through interaction. EL methods, however, still struggle with grounding, generalization, and scalability, which has fueled recent interest in large language models (LLMs) that bring natural language priors for reasoning, planning, and collaboration in more open-ended settings. Across MARL, EL, and LLM-based systems, we highlight how different choices shape communication design, where the main trade-offs lie, and what remains unsolved. We distill practical design patterns and open challenges to support future hybrid systems that combine learning, language, and control for scalable and interpretable multi-agent collaboration.

The Five Ws of Multi-Agent Communication: Who Talks to Whom, When, What, and Why -- A Survey from MARL to Emergent Language and LLMs

TL;DR

This survey unifies three MA-Comm strands—MARL-based learned communication, Emergent Language, and LLM-powered coordination—under the Five Ws framework: who communicates with whom, what is shared, when communication occurs, why it helps, and how it is implemented. It traces the evolution from hand-designed MARL protocols to end-to-end learned messaging, discrete emergent languages, and language-grounded LLM systems, and it articulates bridging patterns across paradigms. The work highlights practical design patterns, cross-paradigm gaps, and open challenges—grounding, interpretability, scalability, and theoretical guarantees—to guide future hybrid systems that couple learning, language, and control. By providing a cross-cutting taxonomy, bridge sections, and forward-looking open problems, the paper aims to catalyze robust, scalable, and human-aligned multi-agent communication in real-world settings.

Abstract

Multi-agent sequential decision-making powers many real-world systems, from autonomous vehicles and robotics to collaborative AI assistants. In dynamic, partially observable environments, communication is often what reduces uncertainty and makes collaboration possible. This survey reviews multi-agent communication (MA-Comm) through the Five Ws: who communicates with whom, what is communicated, when communication occurs, and why communication is beneficial. This framing offers a clean way to connect ideas across otherwise separate research threads. We trace how communication approaches have evolved across three major paradigms. In Multi-Agent Reinforcement Learning (MARL), early methods used hand-designed or implicit protocols, followed by end-to-end learned communication optimized for reward and control. While successful, these protocols are frequently task-specific and hard to interpret, motivating work on Emergent Language (EL), where agents can develop more structured or symbolic communication through interaction. EL methods, however, still struggle with grounding, generalization, and scalability, which has fueled recent interest in large language models (LLMs) that bring natural language priors for reasoning, planning, and collaboration in more open-ended settings. Across MARL, EL, and LLM-based systems, we highlight how different choices shape communication design, where the main trade-offs lie, and what remains unsolved. We distill practical design patterns and open challenges to support future hybrid systems that combine learning, language, and control for scalable and interpretable multi-agent collaboration.
Paper Structure (203 sections, 10 equations, 25 figures, 15 tables)

This paper contains 203 sections, 10 equations, 25 figures, 15 tables.

Figures (25)

  • Figure 1: Three core dimensions of multi-agent communication surveyed in this work: MARL-based, emergent language, and LLM-driven communication.
  • Figure 2: MARL-Comm Agents Taxonomy
  • Figure 3: Organization of the key dimensions of communication design in Sec. \ref{['sec:marlcomm']}: what and whom to communicate with (Sec. \ref{['sec:MARL_what_to_comm']}), when to communicate under resource constraints (Sec. \ref{['sec:MARL_when_to_comm']}), why communicate and how communication is shaped by interaction scenarios and learning architectures (Sec. \ref{['sec:MARL_how_to_comm']}). It highlights how various communication mechanisms are conditioned on task structure, bandwidth limitations, agent relationships, and training/execution paradigms.
  • Figure 4: Centralized and Decentralized Learning. In centralized learning (a), policies are jointly optimized with shared knowledge and gradient flow. In decentralized learning (b), each agent operates and learns independently using local observations and rewards.
  • Figure 5: Three types of CTDE Scheme.
  • ...and 20 more figures