Table of Contents
Fetching ...

Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

Dylan Turpin, Tao Zhong, Shutong Zhang, Guanglei Zhu, Jingzhou Liu, Ritvik Singh, Eric Heiden, Miles Macklin, Stavros Tsogkas, Sven Dickinson, Animesh Garg

TL;DR

The paper addresses the data bottleneck in multi-finger grasping by introducing a differentiable grasping simulator, Fast-Grasp'D, and a large-scale Grasp'D-1M dataset. It enables gradient-based optimization to search the full grasp space with thousands of contacts, producing more stable and contact-rich grasps than analytic baselines. The Grasp'D-1M dataset covers three, four, and five-finger robotic hands with multi-modal visual inputs, and yields substantial improvements in grasp quality metrics such as $\epsilon$ and $Vol$ over GraspIt! and prior differentiable methods. Training a vision-based grasp predictor on Grasp'D-1M demonstrates practical impact, with ~30\% more contact area, ~33\% higher $\epsilon$, and ~35\% lower simulated displacement, underscoring the value of differentiable data synthesis for downstream perception tasks.

Abstract

Multi-finger grasping relies on high quality training data, which is hard to obtain: human data is hard to transfer and synthetic data relies on simplifying assumptions that reduce grasp quality. By making grasp simulation differentiable, and contact dynamics amenable to gradient-based optimization, we accelerate the search for high-quality grasps with fewer limiting assumptions. We present Grasp'D-1M: a large-scale dataset for multi-finger robotic grasping, synthesized with Fast- Grasp'D, a novel differentiable grasping simulator. Grasp'D- 1M contains one million training examples for three robotic hands (three, four and five-fingered), each with multimodal visual inputs (RGB+depth+segmentation, available in mono and stereo). Grasp synthesis with Fast-Grasp'D is 10x faster than GraspIt! and 20x faster than the prior Grasp'D differentiable simulator. Generated grasps are more stable and contact-rich than GraspIt! grasps, regardless of the distance threshold used for contact generation. We validate the usefulness of our dataset by retraining an existing vision-based grasping pipeline on Grasp'D-1M, and showing a dramatic increase in model performance, predicting grasps with 30% more contact, a 33% higher epsilon metric, and 35% lower simulated displacement. Additional details at https://dexgrasp.github.io.

Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

TL;DR

The paper addresses the data bottleneck in multi-finger grasping by introducing a differentiable grasping simulator, Fast-Grasp'D, and a large-scale Grasp'D-1M dataset. It enables gradient-based optimization to search the full grasp space with thousands of contacts, producing more stable and contact-rich grasps than analytic baselines. The Grasp'D-1M dataset covers three, four, and five-finger robotic hands with multi-modal visual inputs, and yields substantial improvements in grasp quality metrics such as and over GraspIt! and prior differentiable methods. Training a vision-based grasp predictor on Grasp'D-1M demonstrates practical impact, with ~30\% more contact area, ~33\% higher , and ~35\% lower simulated displacement, underscoring the value of differentiable data synthesis for downstream perception tasks.

Abstract

Multi-finger grasping relies on high quality training data, which is hard to obtain: human data is hard to transfer and synthetic data relies on simplifying assumptions that reduce grasp quality. By making grasp simulation differentiable, and contact dynamics amenable to gradient-based optimization, we accelerate the search for high-quality grasps with fewer limiting assumptions. We present Grasp'D-1M: a large-scale dataset for multi-finger robotic grasping, synthesized with Fast- Grasp'D, a novel differentiable grasping simulator. Grasp'D- 1M contains one million training examples for three robotic hands (three, four and five-fingered), each with multimodal visual inputs (RGB+depth+segmentation, available in mono and stereo). Grasp synthesis with Fast-Grasp'D is 10x faster than GraspIt! and 20x faster than the prior Grasp'D differentiable simulator. Generated grasps are more stable and contact-rich than GraspIt! grasps, regardless of the distance threshold used for contact generation. We validate the usefulness of our dataset by retraining an existing vision-based grasping pipeline on Grasp'D-1M, and showing a dramatic increase in model performance, predicting grasps with 30% more contact, a 33% higher epsilon metric, and 35% lower simulated displacement. Additional details at https://dexgrasp.github.io.
Paper Structure (31 sections, 4 equations, 4 figures, 3 tables)

This paper contains 31 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The Grasp'D-1M dataset contains one million unique grasps, each with multi-modal visual inputs for training vision-based robotic grasping. We synthesize these grasps with a new differentiable grasping simulator, Fast-Grasp'D. Gradient information accelerates the grasp search, allowing us to search the full-DOF space (without eigengrasps) and simulate thousands of contacts to produce a dataset of contact-rich, stable grasps that can improve any learned grasping pipeline.
  • Figure 2: Our grasp synthesis pipeline generates the Grasp'D-1M dataset of one million unique grasps in three stages. (1) Grasp generation: For any provided $(\textrm{robot hand}, \textrm{object})$ pair, we generate a set of base grasps by gradient descent over an objective computed by Fast-Grasp'D, our fast and differentiable grasping simulator. (2) Scene generation: We simulate multiple drops of each object onto a table to create scenes with different object poses and transfer base grasps to these scenes. (3) Rendering: Finally, we render each scene (RGB, depth, segmentation, 2D/3D bounding boxes in mono+stereo) from multiple camera angles.
  • Figure 3: Grasp metrics such as epsilon quality, GWS volume, and contact surface area depend on the threshold distance used for contact generation. Our method improves on GraspIt! miller2004graspit and MultiDex li2022gendexgrasp baselines under all threshold choices. Results for the Barrett and Allegro hands (available on our website) follow a similar trend.
  • Figure 4: Contact-rich grasps can be generated by our method which optimizes in the full DOF-space of the hand. The GraspIt! miller2004graspit planner mainly generates fingertip grasps. The baseline grasps exhibit fewer contacts that result in reduced stability compared to grasps synthesized by our method.