Table of Contents
Fetching ...

DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

Yiming Zhong, Qi Jiang, Jingyi Yu, Yuexin Ma

TL;DR

DexGrasp Anything tackles universal dexterous grasping by embedding three physics-based constraints into a diffusion-based generator and augmenting conditioning with LLM-derived semantic priors. The approach uses physics-aware training and a physics-guided sampler to produce feasible and robust hand-object grasps, achieving state-of-the-art results across five benchmarks. A large-scale DexGrasp Anything dataset, totaling over 3.4 million grasp poses across 15,698 objects, demonstrates how data scale and diversity boost generalization, complemented by model-in-the-loop data generation. Real-world ShadowHand experiments and extensive cross-dataset evaluations highlight the method's practical impact for versatile, robust manipulation in unstructured environments.

Abstract

A dexterous hand capable of grasping any object is essential for the development of general-purpose embodied intelligent robots. However, due to the high degree of freedom in dexterous hands and the vast diversity of objects, generating high-quality, usable grasping poses in a robust manner is a significant challenge. In this paper, we introduce DexGrasp Anything, a method that effectively integrates physical constraints into both the training and sampling phases of a diffusion-based generative model, achieving state-of-the-art performance across nearly all open datasets. Additionally, we present a new dexterous grasping dataset containing over 3.4 million diverse grasping poses for more than 15k different objects, demonstrating its potential to advance universal dexterous grasping. The code of our method and our dataset will be publicly released soon.

DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness

TL;DR

DexGrasp Anything tackles universal dexterous grasping by embedding three physics-based constraints into a diffusion-based generator and augmenting conditioning with LLM-derived semantic priors. The approach uses physics-aware training and a physics-guided sampler to produce feasible and robust hand-object grasps, achieving state-of-the-art results across five benchmarks. A large-scale DexGrasp Anything dataset, totaling over 3.4 million grasp poses across 15,698 objects, demonstrates how data scale and diversity boost generalization, complemented by model-in-the-loop data generation. Real-world ShadowHand experiments and extensive cross-dataset evaluations highlight the method's practical impact for versatile, robust manipulation in unstructured environments.

Abstract

A dexterous hand capable of grasping any object is essential for the development of general-purpose embodied intelligent robots. However, due to the high degree of freedom in dexterous hands and the vast diversity of objects, generating high-quality, usable grasping poses in a robust manner is a significant challenge. In this paper, we introduce DexGrasp Anything, a method that effectively integrates physical constraints into both the training and sampling phases of a diffusion-based generative model, achieving state-of-the-art performance across nearly all open datasets. Additionally, we present a new dexterous grasping dataset containing over 3.4 million diverse grasping poses for more than 15k different objects, demonstrating its potential to advance universal dexterous grasping. The code of our method and our dataset will be publicly released soon.

Paper Structure

This paper contains 26 sections, 15 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 2: t-SNE visualization of the object features in our dataset compared to existing datasets. Each point represents an object, and different markers and colors are used to distinguish between datasets. For clarity, we randomly sample 5% objects from each dataset for visualization.
  • Figure 3: Qualitative visualization of grasping results in Table 2.
  • Figure 4: Visualization of the ablation study. Two rows show different views of each grasp.
  • Figure 5: Visualization of cross-dataset evaluation results shown in Table 4. The top row shows models trained on DexGraspNet, while the bottom row displays models trained on our dataset.
  • Figure 6: Real-world evaluation for our method.
  • ...and 7 more figures