Vision-based Manipulation from Single Human Video with Open-World Object Graphs
Yifeng Zhu, Arisrei Lim, Peter Stone, Yuke Zhu
TL;DR
This paper tackles vision-based robot manipulation from a single human video in open-world settings. It introduces ORION, an object-centric framework that builds Open-world Object Graphs (OOGs) from the demonstration and generates a manipulation plan to guide a robot, achieving generalization across backgrounds, camera viewpoints, and unseen object instances. The approach combines plan generation from video with SE(3) trajectory optimization and impedance-controlled execution, and demonstrates robustness to RGB-D vs RGB demonstrations. Key contributions include the formal problem formulation for open-world imitation from observation, the OOG representation, and a one-video policy construction that scales to long-horizon tasks. The results show competitive performance and strong generalization, with insights from ablations and RGB-only variants highlighting the effectiveness of object-centric reasoning and TAP-based keypoint tracking.
Abstract
This work presents an object-centric approach to learning vision-based manipulation skills from human videos. We investigate the problem of robot manipulation via imitation in the open-world setting, where a robot learns to manipulate novel objects from a single video demonstration. We introduce ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB or RGB-D video and deriving a policy that conditions on the extracted plan. Our method enables the robot to learn from videos captured by daily mobile devices and to generalize the policies to deployment environments with varying visual backgrounds, camera angles, spatial layouts, and novel object instances. We systematically evaluate our method on both short-horizon and long-horizon tasks, using RGB-D and RGB-only demonstration videos. Across varied tasks and demonstration types (RGB-D / RGB), we observe an average success rate of 74.4%, demonstrating the efficacy of ORION in learning from a single human video in the open world. Additional materials can be found on our project website: https://ut-austin-rpl.github.io/ORION-release.
