SparTa: Sparse Graphical Task Models from a Handful of Demonstrations
Adrian Röfer, Nick Heppert, Abhinav Valada
TL;DR
This paper introduces SparTa, an object-centric framework for learning what a manipulation task seeks to achieve by constructing sparse task skeletons from demonstrations. It segments demonstrations into manipulation graphs, generates events from topological changes, and matches objects across demonstrations using pre-trained features to extract a minimal, probabilistic task skeleton. The learned model provides distributions over relative object poses at task transitions and enables planning and execution in new environments, including zero-shot transfer to a real robot. Experiments on HANDSOME and Robocasa, plus real-robot deployment, show robust segmentation and improved model fidelity with additional demonstrations, while also highlighting failure modes in coordinated or ambiguous tasks.
Abstract
Learning long-horizon manipulation tasks efficiently is a central challenge in robot learning from demonstration. Unlike recent endeavors that focus on directly learning the task in the action domain, we focus on inferring what the robot should achieve in the task, rather than how to do so. To this end, we represent evolving scene states using a series of graphical object relationships. We propose a demonstration segmentation and pooling approach that extracts a series of manipulation graphs and estimates distributions over object states across task phases. In contrast to prior graph-based methods that capture only partial interactions or short temporal windows, our approach captures complete object interactions spanning from the onset of control to the end of the manipulation. To improve robustness when learning from multiple demonstrations, we additionally perform object matching using pre-trained visual features. In extensive experiments, we evaluate our method's demonstration segmentation accuracy and the utility of learning from multiple demonstrations for finding a desired minimal task model. Finally, we deploy the fitted models both in simulation and on a real robot, demonstrating that the resulting task representations support reliable execution across environments.
