CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation
Jie Liu, Pan Zhou, Yingjun Du, Ah-Hwee Tan, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves
TL;DR
CaPo addresses the challenge of coordinating large language model–driven embodied agents on long-horizon tasks by introducing a two-phase cooperative planning framework: (1) meta-plan generation, where agents collaboratively produce a long-term, coherent task decomposition, and (2) progress-adaptive meta-plan execution, where the plan is iteratively updated in response to new progress through multi-turn discussions. The meta-plan is generated via a designated designer and evaluator agents and refined through iterative prompts, while a progress-adaptive module updates the plan as discoveries or subtasks arise. Experimental results on TDW-MAT and C-WAH show CaPo achieves higher task completion rates and efficiency than state-of-the-art baselines, including CoELA, ProAgent, and RoCo, across multiple LLMs and perception settings. The work demonstrates that structured, long-horizon planning combined with progress-driven adaptation markedly improves cooperative behavior in embodied multi-agent systems, with practical implications for complex, collaborative tasks in dynamic environments.
Abstract
In this work, we address the cooperation problem among large language model (LLM) based embodied agents, where agents must cooperate to achieve a common goal. Previous methods often execute actions extemporaneously and incoherently, without long-term strategic and cooperative planning, leading to redundant steps, failures, and even serious repercussions in complex tasks like search-and-rescue missions where discussion and cooperative plan are crucial. To solve this issue, we propose Cooperative Plan Optimization (CaPo) to enhance the cooperation efficiency of LLM-based embodied agents. Inspired by human cooperation schemes, CaPo improves cooperation efficiency with two phases: 1) meta-plan generation, and 2) progress-adaptive meta-plan and execution. In the first phase, all agents analyze the task, discuss, and cooperatively create a meta-plan that decomposes the task into subtasks with detailed steps, ensuring a long-term strategic and coherent plan for efficient coordination. In the second phase, agents execute tasks according to the meta-plan and dynamically adjust it based on their latest progress (e.g., discovering a target object) through multi-turn discussions. This progress-based adaptation eliminates redundant actions, improving the overall cooperation efficiency of agents. Experimental results on the ThreeDworld Multi-Agent Transport and Communicative Watch-And-Help tasks demonstrate that CaPo achieves much higher task completion rate and efficiency compared with state-of-the-arts.The code is released at https://github.com/jliu4ai/CaPo.
