Table of Contents
Fetching ...

Collaborative Multi-Robot Non-Prehensile Manipulation via Flow-Matching Co-Generation

Yorai Shaoul, Zhe Chen, Mohamed Naveed Gul Mohamed, Federico Pecora, Maxim Likhachev, Jiaoyang Li

TL;DR

The paper tackles scalable, collaborative multi-robot, multi-object non-prehensile manipulation in cluttered environments by integrating a generative, perception-driven approach with lightweight planning. It introduces GCo, a framework that uses flow-matching co-generation to propose contact formations and manipulation trajectories from images and couples this with Gspi, a scalable anonymous multi-robot motion planner. Among its instantiations, the discrete–continuous co-generation ($\textsc{GCo}_{DC}$) delivers the most reliable performance, and Gspi demonstrates strong scalability to large teams (over 100 robots) in dense scenarios. The results show GCo outperforms learning-based and heuristic baselines in both single- and multi-object manipulation, with substantial gains in success rates and efficiency, and establish the practicality of generative co-design for large-scale collaborative manipulation.

Abstract

Coordinating a team of robots to reposition multiple objects in cluttered environments requires reasoning jointly about where robots should establish contact, how to manipulate objects once contact is made, and how to navigate safely and efficiently at scale. Prior approaches typically fall into two extremes -- either learning the entire task or relying on privileged information and hand-designed planners -- both of which struggle to handle diverse objects in long-horizon tasks. To address these challenges, we present a unified framework for collaborative multi-robot, multi-object non-prehensile manipulation that integrates flow-matching co-generation with anonymous multi-robot motion planning. Within this framework, a generative model co-generates contact formations and manipulation trajectories from visual observations, while a novel motion planner conveys robots at scale. Crucially, the same planner also supports coordination at the object level, assigning manipulated objects to larger target structures and thereby unifying robot- and object-level reasoning within a single algorithmic framework. Experiments in challenging simulated environments demonstrate that our approach outperforms baselines in both motion planning and manipulation tasks, highlighting the benefits of generative co-design and integrated planning for scaling collaborative manipulation to complex multi-agent, multi-object settings. Visit gco-paper.github.io for code and demonstrations.

Collaborative Multi-Robot Non-Prehensile Manipulation via Flow-Matching Co-Generation

TL;DR

The paper tackles scalable, collaborative multi-robot, multi-object non-prehensile manipulation in cluttered environments by integrating a generative, perception-driven approach with lightweight planning. It introduces GCo, a framework that uses flow-matching co-generation to propose contact formations and manipulation trajectories from images and couples this with Gspi, a scalable anonymous multi-robot motion planner. Among its instantiations, the discrete–continuous co-generation () delivers the most reliable performance, and Gspi demonstrates strong scalability to large teams (over 100 robots) in dense scenarios. The results show GCo outperforms learning-based and heuristic baselines in both single- and multi-object manipulation, with substantial gains in success rates and efficiency, and establish the practicality of generative co-design for large-scale collaborative manipulation.

Abstract

Coordinating a team of robots to reposition multiple objects in cluttered environments requires reasoning jointly about where robots should establish contact, how to manipulate objects once contact is made, and how to navigate safely and efficiently at scale. Prior approaches typically fall into two extremes -- either learning the entire task or relying on privileged information and hand-designed planners -- both of which struggle to handle diverse objects in long-horizon tasks. To address these challenges, we present a unified framework for collaborative multi-robot, multi-object non-prehensile manipulation that integrates flow-matching co-generation with anonymous multi-robot motion planning. Within this framework, a generative model co-generates contact formations and manipulation trajectories from visual observations, while a novel motion planner conveys robots at scale. Crucially, the same planner also supports coordination at the object level, assigning manipulated objects to larger target structures and thereby unifying robot- and object-level reasoning within a single algorithmic framework. Experiments in challenging simulated environments demonstrate that our approach outperforms baselines in both motion planning and manipulation tasks, highlighting the benefits of generative co-design and integrated planning for scaling collaborative manipulation to complex multi-agent, multi-object settings. Visit gco-paper.github.io for code and demonstrations.

Paper Structure

This paper contains 40 sections, 3 theorems, 16 equations, 7 figures, 2 algorithms.

Key Result

lemma 1

Gspi ensures that the highest-priority robot makes monotone progress toward its (possibly changing) assigned goal.

Figures (7)

  • Figure 1: The Generative Collaboration (GCo) framework learns components that are hard to model and plans those that are easy: given image observations, it proposes motions for all objects, jointly generates contact points and manipulation trajectories with flow-matching co-generation, and plans multi-robot paths to convey the team to manipulation sites. From left to right: an illustration of one GCo iteration for seven robots collaboratively manipulating three large objects in a shared space.
  • Figure 2: Performance and scalability analyses for multi-robot multi-object manipulation. Middle and bottom rows: results for single- and multi-object manipulation, respectively. We report overall success rates (left column), overall average distance traveled per robot (middle), and a breakdown of success rates. Numbers following method names denote the number of robots available in the scene. GCo methods consistently outperformed MAPush and Heuristic baselines with $\textsc{GCo}_{DC}$ performing the best. Top: Wall and Slalom setups. Find additional visualizations in Appendix \ref{['appx:gco']}.
  • Figure 3: Extreme scalability analysis for Gspi. Across two different problem sets--empty environments with extreme robot density on the top row and obstacles-laden environments that require careful coordination on the bottom row--we tested Gspi and baselines on $270$ planning problems each. Left to right: (a) Illustrations of scenarios. Note that, if robots were not allowed to swap goals, their indices would have been sorted in a row-major order. (b) Success rate results show Gspi solving significantly more problems than baselines. (c) Cost comparisons show Gspi being on-par or improving on solution costs obtained by baselines. (d) Time per iteration is short for Gspi and increases with the problem complexity.
  • Figure 4: MuJoCo environments for our single-object experiments. Left: Empty with three robots, going up. Middle: Easy with two robots, going right. Right: Slalom with one robot, going up.
  • Figure 5: MuJoCo environments for our multi-object experiments. Left: Empty with three robots and five objects, going up. Middle: Wall with six robots, going up. Right: Slalom with nine robots, going up, marked with arrows to avoid clutter.
  • ...and 2 more figures

Theorems & Definitions (4)

  • definition 1: Completeness in AMRMP
  • lemma 1: Monotone Progress
  • lemma 2: Stability at Goals
  • theorem 1: Completeness under Assumption