Table of Contents
Fetching ...

Deep Differentiable Grasp Planner for High-DOF Grippers

Min Liu, Zherong Pan, Kai Xu, Kanishka Ganguly, Dinesh Manocha

TL;DR

The paper tackles robust grasping with high-DOF robotic hands by proposing a differentiable grasp planner that unifies a generalized $Q_1$ metric with geometry-aware losses and a differentiable forward-kinematics pipeline. By extending $Q_1$ to ambient space with inexact contacts and deriving both upper-bound and lower-bound gradient computations (via SOS/SDP sensitivity), the method enables end-to-end training from multi-view depth inputs using a lightweight dataset. The key contributions include a differentiable loss suite for collision, self-collision, and surface proximity, a robust gradient computation framework on triangle meshes, and a training paradigm that yields high-quality, directly deployable grasps on unseen objects, validated on real hardware with significant performance gains. The approach reduces data requirements, improves data-efficiency, and demonstrates practical impact for dexterous manipulation with high-DOF grippers in real-world settings.

Abstract

We present an end-to-end algorithm for training deep neural networks to grasp novel objects. Our algorithm builds all the essential components of a grasping system using a forward-backward automatic differentiation approach, including the forward kinematics of the gripper, the collision between the gripper and the target object, and the metric for grasp poses. In particular, we show that a generalized Q1 grasp metric is defined and differentiable for inexact grasps generated by a neural network, and the derivatives of our generalized Q1 metric can be computed from a sensitivity analysis of the induced optimization problem. We show that the derivatives of the (self-)collision terms can be efficiently computed from a watertight triangle mesh of low-quality. Altogether, our algorithm allows for the computation of grasp poses for high-DOF grippers in an unsupervised mode with no ground truth data, or it improves the results in a supervised mode using a small dataset. Our new learning algorithm significantly simplifies the data preparation for learning-based grasping systems and leads to higher qualities of learned grasps on common 3D shape datasets [7, 49, 26, 25], achieving a 22% higher success rate on physical hardware and a 0.12 higher value on the Q1 grasp quality metric.

Deep Differentiable Grasp Planner for High-DOF Grippers

TL;DR

The paper tackles robust grasping with high-DOF robotic hands by proposing a differentiable grasp planner that unifies a generalized metric with geometry-aware losses and a differentiable forward-kinematics pipeline. By extending to ambient space with inexact contacts and deriving both upper-bound and lower-bound gradient computations (via SOS/SDP sensitivity), the method enables end-to-end training from multi-view depth inputs using a lightweight dataset. The key contributions include a differentiable loss suite for collision, self-collision, and surface proximity, a robust gradient computation framework on triangle meshes, and a training paradigm that yields high-quality, directly deployable grasps on unseen objects, validated on real hardware with significant performance gains. The approach reduces data requirements, improves data-efficiency, and demonstrates practical impact for dexterous manipulation with high-DOF grippers in real-world settings.

Abstract

We present an end-to-end algorithm for training deep neural networks to grasp novel objects. Our algorithm builds all the essential components of a grasping system using a forward-backward automatic differentiation approach, including the forward kinematics of the gripper, the collision between the gripper and the target object, and the metric for grasp poses. In particular, we show that a generalized Q1 grasp metric is defined and differentiable for inexact grasps generated by a neural network, and the derivatives of our generalized Q1 metric can be computed from a sensitivity analysis of the induced optimization problem. We show that the derivatives of the (self-)collision terms can be efficiently computed from a watertight triangle mesh of low-quality. Altogether, our algorithm allows for the computation of grasp poses for high-DOF grippers in an unsupervised mode with no ground truth data, or it improves the results in a supervised mode using a small dataset. Our new learning algorithm significantly simplifies the data preparation for learning-based grasping systems and leads to higher qualities of learned grasps on common 3D shape datasets [7, 49, 26, 25], achieving a 22% higher success rate on physical hardware and a 0.12 higher value on the Q1 grasp quality metric.

Paper Structure

This paper contains 21 sections, 1 theorem, 40 equations, 10 figures, 2 tables.

Key Result

Lemma 9.1

The dual cone of $\mathcal{K}_\mathcal{W}$ is $\mathcal{K}_\mathcal{W}^*$.

Figures (10)

  • Figure 1: Using a small dataset, we train an end-to-end neural network to predict grasp poses for novel objects that it has never see before. The neural network prediction is adjusted using our differentiable grasp quality metric.
  • Figure 2: Our learning architecture takes multi-view depth images of the object as inputs. The features of these images are extracted using ResNet-50, and these are then fed into the fully connected (FC) blocks after view pooling su2015multi to predict the high-DOF configuration of a gripper directly. The configuration space is then brought through a forward kinematics (FK) block and transformed into Euclidean space. We then execute grasps of these configurations in a physical platform. During the training stage, we can formulate various requirements for a grasp planner as loss functions in Euclidean space (red), including (self-)collision-free, grasp quality maximization, data consistency, and closeness between the gripper and the target object's surface. Our method can be used as a locally optimal grasp planner guided by analytic gradients, or as an additional loss function to improve the quality of learned grasp poses.
  • Figure 3: Variables used to define our generalized $Q_1$ metric.
  • Figure 4: Three cases in the computation of ${\partial{\mathbf{d}(\mathbf{p}_i,\mathcal{T})}}/{\partial{\mathbf{p}_i}}$. (a): The geometric feature is an edge $\mathbf{e}$. (b): The geometric feature is a vertex $\mathbf{v}$. (c): The geometric feature is a triangle.
  • Figure 5: Left: The real Shadow Hand. Middle: Original meshes of the Shadow Hand. Right: Convex hulls of each part of the Shadow Hand meshes, the sampled potential grasp points (red), and the sampled potential contact points (green) via Poisson disk sampling.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Lemma 9.1