MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning
Itamar Mishani, Yorai Shaoul, Maxim Likhachev
TL;DR
MOSAIC tackles long-horizon manipulation by framing planning over a skill space composed of Generators and Connectors, guided by a physics-based world model. It builds a mosaic graph and uses a domain-independent oracle to steer exploration toward regions where skills are competent, enabling robust composition of imperfect primitives. The approach is theoretically grounded through probabilistic completeness and empirically validated in both simulation and real-world robotic tasks, where MOSAIC outperforms traditional baselines in success rate and efficiency. This skill-centric, physics-informed planning paradigm offers a scalable path toward general-purpose robots capable of solving complex, open-world manipulation tasks.
Abstract
Planning long-horizon manipulation motions using a set of predefined skills is a central challenge in robotics; solving it efficiently could enable general-purpose robots to tackle novel tasks by flexibly composing generic skills. Solutions to this problem lie in an infinitely vast space of parameterized skill sequences -- a space where common incremental methods struggle to find sequences that have non-obvious intermediate steps. Some approaches reason over lower-dimensional, symbolic spaces, which are more tractable to explore but may be brittle and are laborious to construct. In this work, we introduce MOSAIC, a skill-centric, multi-directional planning approach that targets these challenges by reasoning about which skills to employ and where they are most likely to succeed, by utilizing physics simulation to estimate skill execution outcomes. Specifically, MOSAIC employs two complementary skill families: Generators, which identify ``islands of competence'' where skills are demonstrably effective, and Connectors, which link these skill-trajectories by solving boundary value problems. By focusing planning efforts on regions of high competence, MOSAIC efficiently discovers physically-grounded solutions. We demonstrate its efficacy on complex long-horizon problems in both simulation and the real world, using a diverse set of skills including generative diffusion models, motion planning algorithms, and manipulation-specific models. Visit skill-mosaic.github.io for demonstrations and examples.
