Table of Contents
Fetching ...

Learning to Interact in World Latent for Team Coordination

Dongsu Lee, Daehee Lee, Yaru Niu, Honguk Woo, Amy Zhang, Ding Zhao

TL;DR

This paper addresses the challenge of coordinating multiple agents under partial observability by introducing Interactive World Latent (IWoL), a unified latent representation that encodes both inter-agent relations and task-relevant world information. IWoL learns this latent through a training-time graph-attention communication protocol and decoupled decoders (Interactive and World) that align the latent with coordination cues, while enabling two deployment modes: implicit (no messages at test time) and explicit (messages used by the policy). Across four robotics MARL benchmarks, IWoL variants consistently outperform strong baselines, demonstrate robustness to incomplete observations, and scale to large agent populations; ablations confirm the critical role of world and interactive decoders. The approach offers a simple, efficient drop-in solution for robust multi-agent coordination with potential for broad applicability and generalization in open-world MARL settings.

Abstract

This work presents a novel representation learning framework, interactive world latent (IWoL), to facilitate team coordination in multi-agent reinforcement learning (MARL). Building effective representation for team coordination is a challenging problem, due to the intricate dynamics emerging from multi-agent interaction and incomplete information induced by local observations. Our key insight is to construct a learnable representation space that jointly captures inter-agent relations and task-specific world information by directly modeling communication protocols. This representation, we maintain fully decentralized execution with implicit coordination, all while avoiding the inherent drawbacks of explicit message passing, e.g., slower decision-making, vulnerability to malicious attackers, and sensitivity to bandwidth constraints. In practice, our representation can be used not only as an implicit latent for each agent, but also as an explicit message for communication. Across four challenging MARL benchmarks, we evaluate both variants and show that IWoL provides a simple yet powerful key for team coordination. Moreover, we demonstrate that our representation can be combined with existing MARL algorithms to further enhance their performance.

Learning to Interact in World Latent for Team Coordination

TL;DR

This paper addresses the challenge of coordinating multiple agents under partial observability by introducing Interactive World Latent (IWoL), a unified latent representation that encodes both inter-agent relations and task-relevant world information. IWoL learns this latent through a training-time graph-attention communication protocol and decoupled decoders (Interactive and World) that align the latent with coordination cues, while enabling two deployment modes: implicit (no messages at test time) and explicit (messages used by the policy). Across four robotics MARL benchmarks, IWoL variants consistently outperform strong baselines, demonstrate robustness to incomplete observations, and scale to large agent populations; ablations confirm the critical role of world and interactive decoders. The approach offers a simple, efficient drop-in solution for robust multi-agent coordination with potential for broad applicability and generalization in open-world MARL settings.

Abstract

This work presents a novel representation learning framework, interactive world latent (IWoL), to facilitate team coordination in multi-agent reinforcement learning (MARL). Building effective representation for team coordination is a challenging problem, due to the intricate dynamics emerging from multi-agent interaction and incomplete information induced by local observations. Our key insight is to construct a learnable representation space that jointly captures inter-agent relations and task-specific world information by directly modeling communication protocols. This representation, we maintain fully decentralized execution with implicit coordination, all while avoiding the inherent drawbacks of explicit message passing, e.g., slower decision-making, vulnerability to malicious attackers, and sensitivity to bandwidth constraints. In practice, our representation can be used not only as an implicit latent for each agent, but also as an explicit message for communication. Across four challenging MARL benchmarks, we evaluate both variants and show that IWoL provides a simple yet powerful key for team coordination. Moreover, we demonstrate that our representation can be combined with existing MARL algorithms to further enhance their performance.

Paper Structure

This paper contains 38 sections, 21 equations, 13 figures, 4 tables, 1 algorithm.

Figures (13)

  • Figure 1: A motivating example. Explicit communication's performance degradation in two challenging scenarios: Type I. Bandwidth (bits per second) constraint; Type II. Communication attack.
  • Figure 2: Overview diagram of IWoL framework. Grey box represents each agent. (Left) Implicit variation of IWoL. In this variation, each agent does not use communication messages at execution time. (Right) Explicit variation of IWoL. Policy directly uses a communication message from an explicit communication protocol. Note that IWoL's value function is decentralized, and it uses its own message and local embedding $V_{\theta_i}(m_i^t, f_i^t)$. Herein, $\boldsymbol{\cdot}_{-i}$ means all agent's elements except $i$, and $\boldsymbol{\cdot}_{(-i)i}$ includes all agent's elements including $i$.
  • Figure 3: Design for observation encoder.
  • Figure 4: Design for communication protocol.
  • Figure 5: Diagram for Interactive World Latent Modeling. Solid and dotted lines denote forward and backward processes. The orange line implies only work in training. (Left) Im-IWoL uses the world and interactive decoder to reconstruct the privileged state and communication message $\hat{m}_i^t = \mathrm{Decoder}_{\mathrm{I}}\bigl(z_i^t\bigr)$. (Right) Ex-IWoL sets the latent to the message vector $z_i^t = m_i^t$, and then uses the world decoder to reconstruct the privileged state $\hat{s}_i^t = \mathrm{Decoder}_{\mathrm{W}}\bigl(z_i^t\bigr)$, thereby encouraging $z_i^t$ to embed the global information $s_i^t$.
  • ...and 8 more figures