Dexterous Grasp Transformer
Guo-Hao Xu, Yi-Lin Wei, Dian Zheng, Xiao-Ming Wu, Wei-Shi Zheng
TL;DR
Dexterous Grasp Transformer (DGTR) reframes dexterous grasp generation as set prediction and uses a transformer decoder with learnable grasp queries to predict a diverse set of high-quality grasps in one forward pass. To overcome optimization challenges inherent to set-based learning and penetration penalties, it introduces Dynamic-Static Matching Training (DSMT) and Adversarial-Balanced Test-Time Adaptation (AB-TTA), achieving improved stability, diversity, and feasibility on DexGraspNet. Quantitative results show DGTR outperforms state-of-the-art one-shot methods in grasp quality and diversity while maintaining efficiency, with ablations confirming the effectiveness of DSMT and AB-TTA. The work lays a foundation for rapid, robust dexterous grasp generation in real-world robotic manipulation, reducing computation and data preprocessing needs while expanding directional grasp diversity.$R$ and $t$ are treated within the $SO(3)$ and $\,\mathbb{R}^{3}$ spaces, respectively, and joint configurations are represented in $\mathbb{R}^{J}$ with $J=22$ for ShadowHand.
Abstract
In this work, we propose a novel discriminative framework for dexterous grasp generation, named Dexterous Grasp TRansformer (DGTR), capable of predicting a diverse set of feasible grasp poses by processing the object point cloud with only one forward pass. We formulate dexterous grasp generation as a set prediction task and design a transformer-based grasping model for it. However, we identify that this set prediction paradigm encounters several optimization challenges in the field of dexterous grasping and results in restricted performance. To address these issues, we propose progressive strategies for both the training and testing phases. First, the dynamic-static matching training (DSMT) strategy is presented to enhance the optimization stability during the training phase. Second, we introduce the adversarial-balanced test-time adaptation (AB-TTA) with a pair of adversarial losses to improve grasping quality during the testing phase. Experimental results on the DexGraspNet dataset demonstrate the capability of DGTR to predict dexterous grasp poses with both high quality and diversity. Notably, while keeping high quality, the diversity of grasp poses predicted by DGTR significantly outperforms previous works in multiple metrics without any data pre-processing. Codes are available at https://github.com/iSEE-Laboratory/DGTR .
