Table of Contents
Fetching ...

REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation

Puzhen Yuan, Angyuan Ma, Yunchao Yao, Huaxiu Yao, Masayoshi Tomizuka, Mingyu Ding

TL;DR

REMAC tackles long-horizon, multi-robot manipulation under dynamic scenes by embedding self-reflection and self-evolution within a vision-language planning loop. It uses a scene exploration stage to build environmental knowledge, followed by pre- and post-condition checks and a memory-augmented iterative loop to refine plan feasibility and allocation. Evaluated in a RoboCasa-based simulator across 4 tasks with 50+ objects, REMAC increases average success rate by about 40% and execution efficiency by about 52.7% compared to a single-robot baseline, with Grok3 and related reasoning models benefiting from the reflection framework. The work demonstrates a practical pathway to robust, scalable, scene-agnostic planning for long-horizon, multi-robot robotic manipulation.

Abstract

Vision-language models (VLMs) have demonstrated remarkable capabilities in robotic planning, particularly for long-horizon tasks that require a holistic understanding of the environment for task decomposition. Existing methods typically rely on prior environmental knowledge or carefully designed task-specific prompts, making them struggle with dynamic scene changes or unexpected task conditions, e.g., a robot attempting to put a carrot in the microwave but finds the door was closed. Such challenges underscore two critical issues: adaptability and efficiency. To address them, in this work, we propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution through continuous reflection and self-evolution. REMAC incorporates two key modules: a self-reflection module performing pre-condition and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning. It offers several appealing benefits: 1) Robots can initially explore and reason about the environment without complex prompt design. 2) Robots can keep reflecting on potential planning errors and adapting the plan based on task-specific insights. 3) After iterations, a robot can call another one to coordinate tasks in parallel, maximizing the task execution efficiency. To validate REMAC's effectiveness, we build a multi-agent environment for long-horizon robot manipulation and navigation based on RoboCasa, featuring 4 task categories with 27 task styles and 50+ different objects. Based on it, we further benchmark state-of-the-art reasoning models, including DeepSeek-R1, o3-mini, QwQ, and Grok3, demonstrating REMAC's superiority by boosting average success rates by 40% and execution efficiency by 52.7% over the single robot baseline.

REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation

TL;DR

REMAC tackles long-horizon, multi-robot manipulation under dynamic scenes by embedding self-reflection and self-evolution within a vision-language planning loop. It uses a scene exploration stage to build environmental knowledge, followed by pre- and post-condition checks and a memory-augmented iterative loop to refine plan feasibility and allocation. Evaluated in a RoboCasa-based simulator across 4 tasks with 50+ objects, REMAC increases average success rate by about 40% and execution efficiency by about 52.7% compared to a single-robot baseline, with Grok3 and related reasoning models benefiting from the reflection framework. The work demonstrates a practical pathway to robust, scalable, scene-agnostic planning for long-horizon, multi-robot robotic manipulation.

Abstract

Vision-language models (VLMs) have demonstrated remarkable capabilities in robotic planning, particularly for long-horizon tasks that require a holistic understanding of the environment for task decomposition. Existing methods typically rely on prior environmental knowledge or carefully designed task-specific prompts, making them struggle with dynamic scene changes or unexpected task conditions, e.g., a robot attempting to put a carrot in the microwave but finds the door was closed. Such challenges underscore two critical issues: adaptability and efficiency. To address them, in this work, we propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution through continuous reflection and self-evolution. REMAC incorporates two key modules: a self-reflection module performing pre-condition and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning. It offers several appealing benefits: 1) Robots can initially explore and reason about the environment without complex prompt design. 2) Robots can keep reflecting on potential planning errors and adapting the plan based on task-specific insights. 3) After iterations, a robot can call another one to coordinate tasks in parallel, maximizing the task execution efficiency. To validate REMAC's effectiveness, we build a multi-agent environment for long-horizon robot manipulation and navigation based on RoboCasa, featuring 4 task categories with 27 task styles and 50+ different objects. Based on it, we further benchmark state-of-the-art reasoning models, including DeepSeek-R1, o3-mini, QwQ, and Grok3, demonstrating REMAC's superiority by boosting average success rates by 40% and execution efficiency by 52.7% over the single robot baseline.

Paper Structure

This paper contains 16 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: REMAC resolves planning errors in long-horizon, multi-stage tasks by employing condition checks and reflective evolution, thereby enhancing planning efficiency. 1) Task fails because the robot attempts to open the microwave door while holding a carrot. 2) Task succeeds after the robot, through a condition check process, recognizes the necessity of putting down the carrot. 3) Efficiency is enhanced when the robot correctly sequences its actions by opening the microwave door and then picking the carrot up. 4) Efficiency is further improved when the robot delegates the task of opening the door to another robot.
  • Figure 2: Left: Self-Reflection. Before the execution of subtask $i$, the VLM verifies the pre-conditions to determine whether the plan for subtask $i$ is executable given the observation after completing subtask $i-1$. If not, this indicates an error in the initial planning, and the system engages in a reflection process to identify the cause of this error, which is subsequently stored in the reflection database. Following the execution of subtask $i$, the VLM verifies the post-conditions to assess whether the subtask was successfully executed, given the observation after executing the current task. If not, the system initiates a retry of the subtask. Right: Self-Evolvement. Upon sequential completion of all subtasks, the reflection database—containing accumulated pre-condition-check analysis and last-iteration plan serves as the foundation for generating initial plans for subsequent iterations. This knowledge-augmented process iteratively refines planning logic, yielding an optimized initial plan with feasibility and efficiency for future iterations.
  • Figure 3: We constructed four distinct tasks—OpenCabinetPnP, OpenMicrowavePnP, DefrostInBowl, and HeatOnStove—within the RoboCasa large-scale simulation framework to rigorously evaluate long-horizon multi-robot collaborative planning. Each task incorporates 6–8 spatial layouts, 5–12 dynamically configurable environmental styles, and over 50 graspable objects to simulate real-world complexity. At the beginning of each trial, tasks are initialized using randomized combinations of layouts, styles, and object placements.
  • Figure 4: Our experimental results indicate that: (1) condition checking and reflective evolution effectively enhance both the Task Success Rate and the Subtask Completion Rate; (2) compared to single-robot systems, multi-robot systems demonstrate a reduced Length of Initial Plan and greater efficiency. All tasks were subjected to rigorous validation through ten randomized initializations across four distinct experimental settings.
  • Figure 5: BASE setting: The robot failed because it neglected the logical and spatial constraints of the environment, resulting in the carrot being dropped on the ground. CC setting: The robot underwent a series of redundant steps before realizing it needed to put down the carrot first. RE setting: The robot completed the task with high efficiency. REMAC setting: The robot collaboratively completed the task with higher efficiency.
  • ...and 1 more figures