DexDiffuser: Generating Dexterous Grasps with Diffusion Models

Zehang Weng; Haofei Lu; Danica Kragic; Jens Lundell

DexDiffuser: Generating Dexterous Grasps with Diffusion Models

Zehang Weng, Haofei Lu, Danica Kragic, Jens Lundell

Abstract

We introduce DexDiffuser, a novel dexterous grasping method that generates, evaluates, and refines grasps on partial object point clouds. DexDiffuser includes the conditional diffusion-based grasp sampler DexSampler and the dexterous grasp evaluator DexEvaluator. DexSampler generates high-quality grasps conditioned on object point clouds by iterative denoising of randomly sampled grasps. We also introduce two grasp refinement strategies: Evaluator-Guided Diffusion (EGD) and Evaluator-based Sampling Refinement (ESR). The experiment results demonstrate that DexDiffuser consistently outperforms the state-of-the-art multi-finger grasp generation method FFHNet with an, on average, 9.12% and 19.44% higher grasp success rate in simulation and real robot experiments, respectively. Supplementary materials are available at https://yulihn.github.io/DexDiffuser_page/

DexDiffuser: Generating Dexterous Grasps with Diffusion Models

Abstract

Paper Structure (21 sections, 6 equations, 5 figures, 4 tables)

This paper contains 21 sections, 6 equations, 5 figures, 4 tables.

Introduction
Related Work
Data-Driven Dexterous Grasping
Diffusion Models in Robotics
Problem Statement
Method
The Basis Point Set Representation
Diffusion-based Grasp Sampler
Grasp Evaluator
Grasp Refinement
egd
esr
Implementation Details
Dataset
Experimental evaluation
...and 6 more sections

Figures (5)

Figure 1: DexDiffuser pipeline. Given a partial point cloud captured by a Kinect v3, DexSampler generated a set of high-quality grasps by gradually removing noise from randomly sampled grasps. Subsequently, the grasps were refined and ranked by the score $\mathsf{P}{}$ from DexEvaluator. The grasp with the highest score was finally selected and executed on the real robot.
Figure 2: Model architecture. The left image shows the DexSampler. Its input $\mathbf{f_{O}}$ is processed into a key-value pair while $\mathbf{g}_t$ and $t$ are processed into a query using a self-attention block. The key-value-query triplet is then embedded using a cross-attention block to compute $\hat{\mathbf{\epsilon}}_t$. The right image shows the DexEvaluator that predicts grasp success probability given the same $\mathbf{f_{O}}$ as the DexSampler and $\mathbf{g}_0$ produced by the DexSampler.
Figure 3: Qualitative evaluation of DexSampler-bps on objects from different datasets. The grasps, shown in pink, are generated on the partially observed point clouds, shown in red, rendered from the complete meshes, shown in blue.
Figure 4: Experimental objects. From 1 to 9: cracker box, sugar box, mustard bottle, bleach cleanser, sprayer, metal mug, goblet, toy plane, and pringles.
Figure 5: Successful and failed grasps from DexSampler-bps-esr-2. The first two images successfully grasp the bleach cleanser and the goblet, while the latter two show different failure cases on the metal mug and the pringles.

DexDiffuser: Generating Dexterous Grasps with Diffusion Models

Abstract

DexDiffuser: Generating Dexterous Grasps with Diffusion Models

Authors

Abstract

Table of Contents

Figures (5)