REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation

Puzhen Yuan; Angyuan Ma; Yunchao Yao; Huaxiu Yao; Masayoshi Tomizuka; Mingyu Ding

REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation

Puzhen Yuan, Angyuan Ma, Yunchao Yao, Huaxiu Yao, Masayoshi Tomizuka, Mingyu Ding

TL;DR

REMAC tackles long-horizon, multi-robot manipulation under dynamic scenes by embedding self-reflection and self-evolution within a vision-language planning loop. It uses a scene exploration stage to build environmental knowledge, followed by pre- and post-condition checks and a memory-augmented iterative loop to refine plan feasibility and allocation. Evaluated in a RoboCasa-based simulator across 4 tasks with 50+ objects, REMAC increases average success rate by about 40% and execution efficiency by about 52.7% compared to a single-robot baseline, with Grok3 and related reasoning models benefiting from the reflection framework. The work demonstrates a practical pathway to robust, scalable, scene-agnostic planning for long-horizon, multi-robot robotic manipulation.

Abstract

Vision-language models (VLMs) have demonstrated remarkable capabilities in robotic planning, particularly for long-horizon tasks that require a holistic understanding of the environment for task decomposition. Existing methods typically rely on prior environmental knowledge or carefully designed task-specific prompts, making them struggle with dynamic scene changes or unexpected task conditions, e.g., a robot attempting to put a carrot in the microwave but finds the door was closed. Such challenges underscore two critical issues: adaptability and efficiency. To address them, in this work, we propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution through continuous reflection and self-evolution. REMAC incorporates two key modules: a self-reflection module performing pre-condition and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning. It offers several appealing benefits: 1) Robots can initially explore and reason about the environment without complex prompt design. 2) Robots can keep reflecting on potential planning errors and adapting the plan based on task-specific insights. 3) After iterations, a robot can call another one to coordinate tasks in parallel, maximizing the task execution efficiency. To validate REMAC's effectiveness, we build a multi-agent environment for long-horizon robot manipulation and navigation based on RoboCasa, featuring 4 task categories with 27 task styles and 50+ different objects. Based on it, we further benchmark state-of-the-art reasoning models, including DeepSeek-R1, o3-mini, QwQ, and Grok3, demonstrating REMAC's superiority by boosting average success rates by 40% and execution efficiency by 52.7% over the single robot baseline.

REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation

TL;DR

Abstract

REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)