Table of Contents
Fetching ...

Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models

Brennen A. Hill, Mant Koh En Wei, Thangavel Jishnuanandh

TL;DR

This work interrogates whether multi-agent coordination under partial observability benefits more from emergent communication or from engineered, predictive world models. It contrasts Learned Direct Communication (LDC) with Intention Communication powered by a lightweight Imagined Trajectory Generation Module (ITGM) and a Message Generation Network (MGN). Empirical results in grid-world task allocation show that LDC offers limited scalability, whereas ITGM-based communication sustains high performance and sample efficiency as environment size and partial observability increase. The findings argue that integrating structured, predictive world models into MARL provides a robust foundation for goal-directed coordination, with implications for embodied AI and scalable multi-agent systems.

Abstract

Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), which uses the agent's own policy to simulate future states. A Message Generation Network (MGN) then compresses this plan into a message. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems, while scaling environmental complexity. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.

Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models

TL;DR

This work interrogates whether multi-agent coordination under partial observability benefits more from emergent communication or from engineered, predictive world models. It contrasts Learned Direct Communication (LDC) with Intention Communication powered by a lightweight Imagined Trajectory Generation Module (ITGM) and a Message Generation Network (MGN). Empirical results in grid-world task allocation show that LDC offers limited scalability, whereas ITGM-based communication sustains high performance and sample efficiency as environment size and partial observability increase. The findings argue that integrating structured, predictive world models into MARL provides a robust foundation for goal-directed coordination, with implications for embodied AI and scalable multi-agent systems.

Abstract

Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), which uses the agent's own policy to simulate future states. A Message Generation Network (MGN) then compresses this plan into a message. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems, while scaling environmental complexity. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.

Paper Structure

This paper contains 48 sections, 3 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: A visualization of the $6 \times 6$ grid world environment with two agents (blue and red) and two goals (green squares).
  • Figure 2: The LDC architecture. Each agent's policy network takes its local observation and the incoming message from the previous timestep to produce an action and an outgoing message for the next timestep. The critic network estimates the state value.