Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation

Zoey Qiuyu Chen; Karl Van Wyk; Yu-Wei Chao; Wei Yang; Arsalan Mousavian; Abhishek Gupta; Dieter Fox

Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation

Zoey Qiuyu Chen, Karl Van Wyk, Yu-Wei Chao, Wei Yang, Arsalan Mousavian, Abhishek Gupta, Dieter Fox

TL;DR

The paper tackles the challenge of robust dexterous grasping by bootstrapping a small set of human demonstrations into a large, diverse training dataset through implicit shape augmentation using a correspondence-aware deformation model (DIF-Net). It integrates human-to-robot retargeting, dense shape deformations with grasp transfer, and dynamics-aware refinement, training a PointNet++-based policy that operates on point clouds to predict pregrasp and final grasp poses. The approach demonstrates strong sim-to-real transfer, achieving 79% real-world success on unseen objects and robust generalization across multiple unseen object categories, with extensive ablations underscoring the importance of dense correspondences and augmentation. This work provides a scalable framework for dexterous manipulation by leveraging learned shape correspondences to generate realistic, dynamically consistent grasps for novel objects.

Abstract

Dexterous robotic hands have the capability to interact with a wide variety of household objects to perform tasks like grasping. However, learning robust real world grasping policies for arbitrary objects has proven challenging due to the difficulty of generating high quality training data. In this work, we propose a learning system (ISAGrasp) for leveraging a small number of human demonstrations to bootstrap the generation of a much larger dataset containing successful grasps on a variety of novel objects. Our key insight is to use a correspondence-aware implicit generative model to deform object meshes and demonstrated human grasps in order to generate a diverse dataset of novel objects and successful grasps for supervised learning, while maintaining semantic realism. We use this dataset to train a robust grasping policy in simulation which can be deployed in the real world. We demonstrate grasping performance with a four-fingered Allegro hand in both simulation and the real world, and show this method can handle entirely new semantic classes and achieve a 79% success rate on grasping unseen objects in the real world.

Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 16 figures, 3 tables)

This paper contains 15 sections, 3 equations, 16 figures, 3 tables.

Introduction
ISAGrasp: Dexterous Grasping Policies via Implicit Shape Augmentation
Human-Robot Retargeting
Implicit Shape Augmentation
Grasp Refinement for Dynamics Consistency
Policy Learning via Supervised Learning
System Details
Experiments
Simulation Results.
Real World Experiments.
Ablations and Analysis
Related Work
Manipulation with Dexterous Hands.
Data Augmentation and Robustness.
Conclusion

Figures (16)

Figure 1: Implicit Shape Augmentation for generating augmented dataset from demonstrations. First a human demonstration is retargeted onto the Allegro hand to generate meshes and grasp labels. This data can then be used to generate a variety of new objects via shape augmentation with DIF-Net deng2021deformed. Grasps for these deformed objects can then be further refined with rejection sampling to generate dynamically consistent grasps.
Figure 2: Illustration of retargeting a human hand demonstration to an Allegro hand
Figure 3: Optimization setup describing the retargeting problem from human to Allegro hand
Figure 4: Deformation map and grasping correspondences for objects generated in ISAGrasp. Grasping correspondences on the original object (reference) and the deformed objects are highlighted inside the circle. As can be seen, object semantics are maintained. Different object instances are generated by sampling different latents
Figure 5: Policy network architecture. The point-net++ architecture inputs an object point cloud $p$, object surface normal $\vec{N_o}$, the table normal $\vec{N_t}$, robot facing direction $\vec{N_f}$ and the pointing direction $\vec{N_p}$ to generate palm translation, rotation and finger joints for the dexterous grasp.
...and 11 more figures

Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation

TL;DR

Abstract

Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (16)