Learning Multi-Step Manipulation Tasks from A Single Human Demonstration
Dingkun Guo
TL;DR
This work tackles the problem of data-efficient learning for multi-step robot manipulation from a single human demonstration. It presents a three-module system—vision, learning, and manipulation—that converts RGBD demonstrations into executable robot primitives by identifying task-relevant key poses with Grounded Segment Anything and by tracking hand-object and object-object contacts. The learning module segments actions into three primitives Make-Maintain-Break and generates policies that map object poses and contacts to robot motions, while the manipulation module adapts these policies to different robots and environments via pose proposals and collision-aware planning. Experiments in a lab and a home kitchen show notable per-step success in seen contexts, with challenges in generalizing to unseen objects and environments, highlighting the importance of robust pose estimation and motion planning. The approach offers a data-efficient, modular pathway toward generalizable robot manipulation from a single demonstration, with clear avenues for extending to more complex objects and tasks.
Abstract
Learning from human demonstrations has exhibited remarkable achievements in robot manipulation. However, the challenge remains to develop a robot system that matches human capabilities and data efficiency in learning and generalizability, particularly in complex, unstructured real-world scenarios. We propose a system that processes RGBD videos to translate human actions to robot primitives and identifies task-relevant key poses of objects using Grounded Segment Anything. We then address challenges for robots in replicating human actions, considering the human-robot differences in kinematics and collision geometry. To test the effectiveness of our system, we conducted experiments focusing on manual dishwashing. With a single human demonstration recorded in a mockup kitchen, the system achieved 50-100% success for each step and up to a 40% success rate for the whole task with different objects in a home kitchen. Videos are available at https://robot-dishwashing.github.io
