Table of Contents
Fetching ...

DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation

Qian Feng, David S. Martinez Lema, Mohammadhossein Malmir, Hang Li, Jianxiang Feng, Zhaopeng Chen, Alois Knoll

Abstract

We introduce DexGanGrasp, a dexterous grasping synthesis method that generates and evaluates grasps with single view in real time. DexGanGrasp comprises a Conditional Generative Adversarial Networks (cGANs)-based DexGenerator to generate dexterous grasps and a discriminator-like DexEvalautor to assess the stability of these grasps. Extensive simulation and real-world expriments showcases the effectiveness of our proposed method, outperforming the baseline FFHNet with an 18.57% higher success rate in real-world evaluation. We further extend DexGanGrasp to DexAfford-Prompt, an open-vocabulary affordance grounding pipeline for dexterous grasping leveraging Multimodal Large Language Models (MLLMs) and Vision Language Models (VLMs), to achieve task-oriented grasping with successful real-world deployments.

DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation

Abstract

We introduce DexGanGrasp, a dexterous grasping synthesis method that generates and evaluates grasps with single view in real time. DexGanGrasp comprises a Conditional Generative Adversarial Networks (cGANs)-based DexGenerator to generate dexterous grasps and a discriminator-like DexEvalautor to assess the stability of these grasps. Extensive simulation and real-world expriments showcases the effectiveness of our proposed method, outperforming the baseline FFHNet with an 18.57% higher success rate in real-world evaluation. We further extend DexGanGrasp to DexAfford-Prompt, an open-vocabulary affordance grounding pipeline for dexterous grasping leveraging Multimodal Large Language Models (MLLMs) and Vision Language Models (VLMs), to achieve task-oriented grasping with successful real-world deployments.
Paper Structure (21 sections, 8 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 21 sections, 8 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 2: DexGanGrasp consists of a DexGenerator, a DexDiscriminator and a DexEvaluator. DexGenerator takes as input a bps-encoded point cloud as well as a sample $z$ from latent space. It tries to predict various "fake" grasps $\mathbf{g_{gen}}=(\mathbf{R},\mathbf{t},\mathbf{j})$, as close as possible to the positive ground truth "real" grasps $\mathbf{g}_{pos}$. Meanwhile, DexDiscriminator tries to differentiate between these two types of grasps. During inference, DexEvaluator filters stable grasps out of $\mathbf{g}$ from DexGenerator and the best grasp is chosen for the robot to execute.
  • Figure 3: DexAfford-Prompt builds on top of DexGanGrasp as an open-vocabulary affordance grounding pipeline to achieve task-oriented dexterous grasping. Firstly, an RGB image and the question "What do you see?" are fed to ChatGPT 4o () openai2024chatgpt to obtain the object name. The detailed prompt is in \ref{['block:affordance_prompt']}. This object name, along with a user-initiated affordance such as "grab", is fed into ChatGPT 4 () openai2024chatgpt to predict the object part name, which affords the task. Further,VLPart () peize2023vlpart segments the part from image space and projects to 3D space to obtain object part point cloud. A -based filtering function is used to filter out the grasps that do not target the object part. Finally, the grasps are ranked with DexEvaluator for final execution.
  • Figure 4: On the left side, there are seven objects used for general grasping experiments: Agile Bottle, YCB Drill, RealSense Box, Spam Can (top row); Mustard Bottle, Green Pan, Red Mug (bottom row). On the right side, there are four objects specifically used for testing task-oriented grasping: Red Brush, Green Pan, Spray, and Hammer.
  • Figure 5: Successful part predictions on the first row and some failure cases in the second row from VLPart () peize2023vlpart.