Table of Contents
Fetching ...

DexDiffuser: Generating Dexterous Grasps with Diffusion Models

Zehang Weng, Haofei Lu, Danica Kragic, Jens Lundell

Abstract

We introduce DexDiffuser, a novel dexterous grasping method that generates, evaluates, and refines grasps on partial object point clouds. DexDiffuser includes the conditional diffusion-based grasp sampler DexSampler and the dexterous grasp evaluator DexEvaluator. DexSampler generates high-quality grasps conditioned on object point clouds by iterative denoising of randomly sampled grasps. We also introduce two grasp refinement strategies: Evaluator-Guided Diffusion (EGD) and Evaluator-based Sampling Refinement (ESR). The experiment results demonstrate that DexDiffuser consistently outperforms the state-of-the-art multi-finger grasp generation method FFHNet with an, on average, 9.12% and 19.44% higher grasp success rate in simulation and real robot experiments, respectively. Supplementary materials are available at https://yulihn.github.io/DexDiffuser_page/

DexDiffuser: Generating Dexterous Grasps with Diffusion Models

Abstract

We introduce DexDiffuser, a novel dexterous grasping method that generates, evaluates, and refines grasps on partial object point clouds. DexDiffuser includes the conditional diffusion-based grasp sampler DexSampler and the dexterous grasp evaluator DexEvaluator. DexSampler generates high-quality grasps conditioned on object point clouds by iterative denoising of randomly sampled grasps. We also introduce two grasp refinement strategies: Evaluator-Guided Diffusion (EGD) and Evaluator-based Sampling Refinement (ESR). The experiment results demonstrate that DexDiffuser consistently outperforms the state-of-the-art multi-finger grasp generation method FFHNet with an, on average, 9.12% and 19.44% higher grasp success rate in simulation and real robot experiments, respectively. Supplementary materials are available at https://yulihn.github.io/DexDiffuser_page/
Paper Structure (21 sections, 6 equations, 5 figures, 4 tables)

This paper contains 21 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: DexDiffuser pipeline. Given a partial point cloud captured by a Kinect v3, DexSampler generated a set of high-quality grasps by gradually removing noise from randomly sampled grasps. Subsequently, the grasps were refined and ranked by the score $\mathsf{P}{}$ from DexEvaluator. The grasp with the highest score was finally selected and executed on the real robot.
  • Figure 2: Model architecture. The left image shows the DexSampler. Its input $\mathbf{f_{O}}$ is processed into a key-value pair while $\mathbf{g}_t$ and $t$ are processed into a query using a self-attention block. The key-value-query triplet is then embedded using a cross-attention block to compute $\hat{\mathbf{\epsilon}}_t$. The right image shows the DexEvaluator that predicts grasp success probability given the same $\mathbf{f_{O}}$ as the DexSampler and $\mathbf{g}_0$ produced by the DexSampler.
  • Figure 3: Qualitative evaluation of DexSampler-bps on objects from different datasets. The grasps, shown in pink, are generated on the partially observed point clouds, shown in red, rendered from the complete meshes, shown in blue.
  • Figure 4: Experimental objects. From 1 to 9: cracker box, sugar box, mustard bottle, bleach cleanser, sprayer, metal mug, goblet, toy plane, and pringles.
  • Figure 5: Successful and failed grasps from DexSampler-bps-esr-2. The first two images successfully grasp the bleach cleanser and the goblet, while the latter two show different failure cases on the metal mug and the pringles.