Table of Contents
Fetching ...

EmCoop: A Framework and Benchmark for Embodied Cooperation Among LLM Agents

Hanqing Yang, Shiyu Chen, Narjes Nourzad, Marie Siew, Jingdi Chen, Carlee Joe-Wong

TL;DR

This paper introduces EmCoop, a benchmark framework for studying cooperation in LLM-based embodied multi-agent systems, and proposes generalizable, process-level metrics that diagnose collaboration quality and failure modes, beyond final task success.

Abstract

Real-world scenarios increasingly require multiple embodied agents to collaborate in dynamic environments under embodied constraints, as many tasks exceed the capabilities of any single agent. Recent advances in large language models (LLMs) enable high-level cognitive coordination through reasoning, planning, and natural language communication. However, fine-grained analyses of how such collaboration emerges, unfolds, and contributes to task success in embodied multi-agent systems are difficult to conduct with existing benchmarks. In this paper, we introduce EmCoop, a benchmark framework for studying cooperation in LLM-based embodied multi-agent systems. Our framework separates a high-level cognitive layer from a low-level embodied interaction layer, allowing us to characterize agent cooperation through their interleaved dynamics over time. Given a cooperation-constrained embodied task, we propose generalizable, process-level metrics that diagnose collaboration quality and failure modes, beyond final task success. We instantiate our framework in two embodied environments that scale to arbitrary numbers of agents and support diverse communication topologies, and use these instantiations to demonstrate how EmCoop enables systematic analysis of cooperation dynamics across team sizes and task settings. The project web page can be found at: https://happyeureka.github.io/emcoop.

EmCoop: A Framework and Benchmark for Embodied Cooperation Among LLM Agents

TL;DR

This paper introduces EmCoop, a benchmark framework for studying cooperation in LLM-based embodied multi-agent systems, and proposes generalizable, process-level metrics that diagnose collaboration quality and failure modes, beyond final task success.

Abstract

Real-world scenarios increasingly require multiple embodied agents to collaborate in dynamic environments under embodied constraints, as many tasks exceed the capabilities of any single agent. Recent advances in large language models (LLMs) enable high-level cognitive coordination through reasoning, planning, and natural language communication. However, fine-grained analyses of how such collaboration emerges, unfolds, and contributes to task success in embodied multi-agent systems are difficult to conduct with existing benchmarks. In this paper, we introduce EmCoop, a benchmark framework for studying cooperation in LLM-based embodied multi-agent systems. Our framework separates a high-level cognitive layer from a low-level embodied interaction layer, allowing us to characterize agent cooperation through their interleaved dynamics over time. Given a cooperation-constrained embodied task, we propose generalizable, process-level metrics that diagnose collaboration quality and failure modes, beyond final task success. We instantiate our framework in two embodied environments that scale to arbitrary numbers of agents and support diverse communication topologies, and use these instantiations to demonstrate how EmCoop enables systematic analysis of cooperation dynamics across team sizes and task settings. The project web page can be found at: https://happyeureka.github.io/emcoop.
Paper Structure (42 sections, 28 equations, 21 figures, 6 tables, 1 algorithm)

This paper contains 42 sections, 28 equations, 21 figures, 6 tables, 1 algorithm.

Figures (21)

  • Figure 1: Overview of EmCoop.(1) Agents operate through a dual-layer interface that bridges high-level planning (in Cognitive Layer) and low-level execution (in Primitive Layer) (Sec \ref{['sec:env-interface']}). (2) We instantiate our generalizable task designs (Sec \ref{['sec:rq_method']}) in our benchmark (Sec \ref{['sec:benchmark']}), which contains two embodied environments (MA-CRAFTER & CUBE) and diverse cooperative tasks. (3) Within the Multi-Agent Environment Interaction Loop (MAEIL), agents make plans and communicate asynchronously in Layer I, with primitive action execution handled by Layer II.(4) Through (1), (2), and (3), we make cooperation observable and quantifiable with cooperative metrics that enable systematic analysis of cooperative dynamics (Sec \ref{['sec:metrics']}).
  • Figure 2: Decoupling cognitive and environment clocks in multi-agent execution. At each cognitive step, an agent maintains an internal state consisting of its current MAEIL stage and its committed symbolic plan. Plans may be newly generated (green), resumed (yellow), or terminated (red, while indicating success or failure).
  • Figure 3: Effects of cooperation topology, task difficulty, and agent count on cooperation dynamics in EmCoop. Radar plots show normalized cooperation metrics across centralized, decentralized, debate, and individual topologies for 2- and 3-agent teams under Easy and Hard settings, evaluated with GPT-5.2 and DeepSeek-V3.2 (DS). The bar plots report decision overhead and total messages, illustrating how EmCoop links constraint satisfaction, communication cost, and planning dynamics across cooperative settings. Centralized topologies generally achieve higher plan coherence, while individual agents exhibit higher plan stability and lower decision overhead due to the absence of communication (and thus fewer interruptions). Decentralized and debate settings increase communication load and tend to reduce coherence. Results shown are from MA-Crafter; full results for both environments are provided in Appendix \ref{['app:resultsTable']} and \ref{['app:exp']}.
  • Figure 4: Case study in MA-Crafter with three agents under Hard difficulty, using DeepSeek-V3.2 as the backend. The figure illustrates environment feedback over time, including task-state transitions, agent capability changes (CG), and cooperative constraint changes with respect to tasks.
  • Figure 5: Communication topologies considered in this work
  • ...and 16 more figures