Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models

Brennen A. Hill; Mant Koh En Wei; Thangavel Jishnuanandh

Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models

Brennen A. Hill, Mant Koh En Wei, Thangavel Jishnuanandh

TL;DR

This work interrogates whether multi-agent coordination under partial observability benefits more from emergent communication or from engineered, predictive world models. It contrasts Learned Direct Communication (LDC) with Intention Communication powered by a lightweight Imagined Trajectory Generation Module (ITGM) and a Message Generation Network (MGN). Empirical results in grid-world task allocation show that LDC offers limited scalability, whereas ITGM-based communication sustains high performance and sample efficiency as environment size and partial observability increase. The findings argue that integrating structured, predictive world models into MARL provides a robust foundation for goal-directed coordination, with implications for embodied AI and scalable multi-agent systems.

Abstract

Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), which uses the agent's own policy to simulate future states. A Message Generation Network (MGN) then compresses this plan into a message. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems, while scaling environmental complexity. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.

Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models

TL;DR

Abstract

Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)