ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback
Sirui Chen, Chen Wang, Kaden Nguyen, Li Fei-Fei, C. Karen Liu
TL;DR
ARCap tackles the scalability challenge of imitation learning data by providing real-time AR feedback that visualizes and retargets human motion to diverse robot embodiments while performing collision checks. The system supports cross-embodiment data collection and uses a diffusion-based imitation-learning pipeline trained on ARCap data, demonstrated through cluttered-object manipulation and long-horizon tasks. User studies and real-robot experiments show ARCap improves data quality, reduces collision and kinematic violations, and enables successful policies across different end-effectors. The work offers an open-source, portable solution that broadens access to robot learning, with potential extensions to mobile humanoids and guided data collection via language models.
Abstract
Recent progress in imitation learning from human demonstrations has shown promising results in teaching robots manipulation skills. To further scale up training datasets, recent works start to use portable data collection devices without the need for physical robot hardware. However, due to the absence of on-robot feedback during data collection, the data quality depends heavily on user expertise, and many devices are limited to specific robot embodiments. We propose ARCap, a portable data collection system that provides visual feedback through augmented reality (AR) and haptic warnings to guide users in collecting high-quality demonstrations. Through extensive user studies, we show that ARCap enables novice users to collect robot-executable data that matches robot kinematics and avoids collisions with the scenes. With data collected from ARCap, robots can perform challenging tasks, such as manipulation in cluttered environments and long-horizon cross-embodiment manipulation. ARCap is fully open-source and easy to calibrate; all components are built from off-the-shelf products. More details and results can be found on our website: https://stanford-tml.github.io/ARCap
