Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation
Zoey Qiuyu Chen, Karl Van Wyk, Yu-Wei Chao, Wei Yang, Arsalan Mousavian, Abhishek Gupta, Dieter Fox
TL;DR
The paper tackles the challenge of robust dexterous grasping by bootstrapping a small set of human demonstrations into a large, diverse training dataset through implicit shape augmentation using a correspondence-aware deformation model (DIF-Net). It integrates human-to-robot retargeting, dense shape deformations with grasp transfer, and dynamics-aware refinement, training a PointNet++-based policy that operates on point clouds to predict pregrasp and final grasp poses. The approach demonstrates strong sim-to-real transfer, achieving 79% real-world success on unseen objects and robust generalization across multiple unseen object categories, with extensive ablations underscoring the importance of dense correspondences and augmentation. This work provides a scalable framework for dexterous manipulation by leveraging learned shape correspondences to generate realistic, dynamically consistent grasps for novel objects.
Abstract
Dexterous robotic hands have the capability to interact with a wide variety of household objects to perform tasks like grasping. However, learning robust real world grasping policies for arbitrary objects has proven challenging due to the difficulty of generating high quality training data. In this work, we propose a learning system (ISAGrasp) for leveraging a small number of human demonstrations to bootstrap the generation of a much larger dataset containing successful grasps on a variety of novel objects. Our key insight is to use a correspondence-aware implicit generative model to deform object meshes and demonstrated human grasps in order to generate a diverse dataset of novel objects and successful grasps for supervised learning, while maintaining semantic realism. We use this dataset to train a robust grasping policy in simulation which can be deployed in the real world. We demonstrate grasping performance with a four-fingered Allegro hand in both simulation and the real world, and show this method can handle entirely new semantic classes and achieve a 79% success rate on grasping unseen objects in the real world.
