Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation
Dylan Turpin, Tao Zhong, Shutong Zhang, Guanglei Zhu, Jingzhou Liu, Ritvik Singh, Eric Heiden, Miles Macklin, Stavros Tsogkas, Sven Dickinson, Animesh Garg
TL;DR
The paper addresses the data bottleneck in multi-finger grasping by introducing a differentiable grasping simulator, Fast-Grasp'D, and a large-scale Grasp'D-1M dataset. It enables gradient-based optimization to search the full grasp space with thousands of contacts, producing more stable and contact-rich grasps than analytic baselines. The Grasp'D-1M dataset covers three, four, and five-finger robotic hands with multi-modal visual inputs, and yields substantial improvements in grasp quality metrics such as $\epsilon$ and $Vol$ over GraspIt! and prior differentiable methods. Training a vision-based grasp predictor on Grasp'D-1M demonstrates practical impact, with ~30\% more contact area, ~33\% higher $\epsilon$, and ~35\% lower simulated displacement, underscoring the value of differentiable data synthesis for downstream perception tasks.
Abstract
Multi-finger grasping relies on high quality training data, which is hard to obtain: human data is hard to transfer and synthetic data relies on simplifying assumptions that reduce grasp quality. By making grasp simulation differentiable, and contact dynamics amenable to gradient-based optimization, we accelerate the search for high-quality grasps with fewer limiting assumptions. We present Grasp'D-1M: a large-scale dataset for multi-finger robotic grasping, synthesized with Fast- Grasp'D, a novel differentiable grasping simulator. Grasp'D- 1M contains one million training examples for three robotic hands (three, four and five-fingered), each with multimodal visual inputs (RGB+depth+segmentation, available in mono and stereo). Grasp synthesis with Fast-Grasp'D is 10x faster than GraspIt! and 20x faster than the prior Grasp'D differentiable simulator. Generated grasps are more stable and contact-rich than GraspIt! grasps, regardless of the distance threshold used for contact generation. We validate the usefulness of our dataset by retraining an existing vision-based grasping pipeline on Grasp'D-1M, and showing a dramatic increase in model performance, predicting grasps with 30% more contact, a 33% higher epsilon metric, and 35% lower simulated displacement. Additional details at https://dexgrasp.github.io.
