Table of Contents
Fetching ...

MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning

Itamar Mishani, Yorai Shaoul, Maxim Likhachev

TL;DR

MOSAIC tackles long-horizon manipulation by framing planning over a skill space composed of Generators and Connectors, guided by a physics-based world model. It builds a mosaic graph and uses a domain-independent oracle to steer exploration toward regions where skills are competent, enabling robust composition of imperfect primitives. The approach is theoretically grounded through probabilistic completeness and empirically validated in both simulation and real-world robotic tasks, where MOSAIC outperforms traditional baselines in success rate and efficiency. This skill-centric, physics-informed planning paradigm offers a scalable path toward general-purpose robots capable of solving complex, open-world manipulation tasks.

Abstract

Planning long-horizon manipulation motions using a set of predefined skills is a central challenge in robotics; solving it efficiently could enable general-purpose robots to tackle novel tasks by flexibly composing generic skills. Solutions to this problem lie in an infinitely vast space of parameterized skill sequences -- a space where common incremental methods struggle to find sequences that have non-obvious intermediate steps. Some approaches reason over lower-dimensional, symbolic spaces, which are more tractable to explore but may be brittle and are laborious to construct. In this work, we introduce MOSAIC, a skill-centric, multi-directional planning approach that targets these challenges by reasoning about which skills to employ and where they are most likely to succeed, by utilizing physics simulation to estimate skill execution outcomes. Specifically, MOSAIC employs two complementary skill families: Generators, which identify ``islands of competence'' where skills are demonstrably effective, and Connectors, which link these skill-trajectories by solving boundary value problems. By focusing planning efforts on regions of high competence, MOSAIC efficiently discovers physically-grounded solutions. We demonstrate its efficacy on complex long-horizon problems in both simulation and the real world, using a diverse set of skills including generative diffusion models, motion planning algorithms, and manipulation-specific models. Visit skill-mosaic.github.io for demonstrations and examples.

MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning

TL;DR

MOSAIC tackles long-horizon manipulation by framing planning over a skill space composed of Generators and Connectors, guided by a physics-based world model. It builds a mosaic graph and uses a domain-independent oracle to steer exploration toward regions where skills are competent, enabling robust composition of imperfect primitives. The approach is theoretically grounded through probabilistic completeness and empirically validated in both simulation and real-world robotic tasks, where MOSAIC outperforms traditional baselines in success rate and efficiency. This skill-centric, physics-informed planning paradigm offers a scalable path toward general-purpose robots capable of solving complex, open-world manipulation tasks.

Abstract

Planning long-horizon manipulation motions using a set of predefined skills is a central challenge in robotics; solving it efficiently could enable general-purpose robots to tackle novel tasks by flexibly composing generic skills. Solutions to this problem lie in an infinitely vast space of parameterized skill sequences -- a space where common incremental methods struggle to find sequences that have non-obvious intermediate steps. Some approaches reason over lower-dimensional, symbolic spaces, which are more tractable to explore but may be brittle and are laborious to construct. In this work, we introduce MOSAIC, a skill-centric, multi-directional planning approach that targets these challenges by reasoning about which skills to employ and where they are most likely to succeed, by utilizing physics simulation to estimate skill execution outcomes. Specifically, MOSAIC employs two complementary skill families: Generators, which identify ``islands of competence'' where skills are demonstrably effective, and Connectors, which link these skill-trajectories by solving boundary value problems. By focusing planning efforts on regions of high competence, MOSAIC efficiently discovers physically-grounded solutions. We demonstrate its efficacy on complex long-horizon problems in both simulation and the real world, using a diverse set of skills including generative diffusion models, motion planning algorithms, and manipulation-specific models. Visit skill-mosaic.github.io for demonstrations and examples.

Paper Structure

This paper contains 22 sections, 14 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: Mosaic solves long-horizon manipulation tasks by generating local skill trajectories (circles) and connecting those with connector skills (squares). Mosaic capitalizes on the skills themselves to guide the exploration process toward regions where they are likely to succeed -- enabling effective composition of generic local skills to solve complex tasks.
  • Figure 2: Simulation setups and their real-world counterparts. Across all scenarios, the robot must place the plate into the bin. In Scenario 1: Transport (a), the robot must push the plate to the edge in order to pick it up. Scenario 2: Transport in Clutter (b) includes additional objects on the table. The robot must move the plate to the edge without displacing other objects. Scenario 3: Transport Among Movable Objects (c) allows the robot to interact with more objects on the table. The robot discovered the need to clear space for manipulating the plate by moving the chips can elsewhere.
  • Figure 3: Algorithms comparison across experimental scenarios. Left: Success rates. Middle: Planning time density with median, IQR, and average. Right: Head-to-head comparison on tests both algorithms solved. Upper-right shows relative planning times; lower-left shows relative sequence lengths. Each cell compares the "row algorithm" to the "column algorithm." For Mosaic, lower values are better in the first row, and higher values are better in the first column.

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4