Self-supervised 6-DoF Robot Grasping by Demonstration via Augmented Reality Teleoperation System
Xiwen Dengxiong, Xueting Wang, Shi Bai, Yunbo Zhang
TL;DR
This work tackles unknown-object grasp pose detection for $6$-DoF grasping under restricted environments where grasp pose annotations are impractical. It introduces a self-supervised framework that leverages an AR teleoperation system to collect human demonstrations and learn a contrastive point-cloud representation, enabling $6$-DoF grasp poses without explicit grasp labels. A key contribution is the demonstration learning module that maps morphology-based demonstrations to pose adjustments, yielding accurate grasps after only a few demonstrations. Real-world experiments show the approach improves grasp success on unseen objects and reduces annotation burden, with sub-second planning and teleoperation latency, making it practical for remote or hazardous settings.
Abstract
Most existing 6-DoF robot grasping solutions depend on strong supervision on grasp pose to ensure satisfactory performance, which could be laborious and impractical when the robot works in some restricted area. To this end, we propose a self-supervised 6-DoF grasp pose detection framework via an Augmented Reality (AR) teleoperation system that can efficiently learn human demonstrations and provide 6-DoF grasp poses without grasp pose annotations. Specifically, the system collects the human demonstration from the AR environment and contrastively learns the grasping strategy from the demonstration. For the real-world experiment, the proposed system leads to satisfactory grasping abilities and learning to grasp unknown objects within three demonstrations.
