Table of Contents
Fetching ...

$\mathcal{D(R,O)}$ Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

Zhenyu Wei, Zhixuan Xu, Jingxiang Guo, Yiwen Hou, Chongkai Gao, Zhehao Cai, Jiayu Luo, Lin Shao

TL;DR

The paper tackles cross-embodiment dexterous grasping by proposing an interaction-centric representation, $\mathcal{D(R,O)}$, and a configuration-invariant pretraining strategy. It jointly learns a $\mathcal{D(R,O)}$ predictor via a CVAE with cross-attention and recovers 6D link poses to derive joint configurations through efficient optimization, enabling fast, stable grasps across multiple hands and objects. In simulation and on real hardware, the method achieves high success rates (approximately $87$–$90\%$ in simulation and $89\%$ in the real world) and demonstrates robust performance with partial observations and zero-shot generalization to novel hands. This approach outperforms existing baselines and offers practical impact for versatile, real-time dexterous manipulation.

Abstract

Dexterous grasping is a fundamental yet challenging skill in robotic manipulation, requiring precise interaction between robotic hands and objects. In this paper, we present $\mathcal{D(R,O)}$ Grasp, a novel framework that models the interaction between the robotic hand in its grasping pose and the object, enabling broad generalization across various robot hands and object geometries. Our model takes the robot hand's description and object point cloud as inputs and efficiently predicts kinematically valid and stable grasps, demonstrating strong adaptability to diverse robot embodiments and object geometries. Extensive experiments conducted in both simulated and real-world environments validate the effectiveness of our approach, with significant improvements in success rate, grasp diversity, and inference speed across multiple robotic hands. Our method achieves an average success rate of 87.53% in simulation in less than one second, tested across three different dexterous robotic hands. In real-world experiments using the LeapHand, the method also demonstrates an average success rate of 89%. $\mathcal{D(R,O)}$ Grasp provides a robust solution for dexterous grasping in complex and varied environments. The code, appendix, and videos are available on our project website at https://nus-lins-lab.github.io/drograspweb/.

$\mathcal{D(R,O)}$ Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

TL;DR

The paper tackles cross-embodiment dexterous grasping by proposing an interaction-centric representation, , and a configuration-invariant pretraining strategy. It jointly learns a predictor via a CVAE with cross-attention and recovers 6D link poses to derive joint configurations through efficient optimization, enabling fast, stable grasps across multiple hands and objects. In simulation and on real hardware, the method achieves high success rates (approximately in simulation and in the real world) and demonstrates robust performance with partial observations and zero-shot generalization to novel hands. This approach outperforms existing baselines and offers practical impact for versatile, real-time dexterous manipulation.

Abstract

Dexterous grasping is a fundamental yet challenging skill in robotic manipulation, requiring precise interaction between robotic hands and objects. In this paper, we present Grasp, a novel framework that models the interaction between the robotic hand in its grasping pose and the object, enabling broad generalization across various robot hands and object geometries. Our model takes the robot hand's description and object point cloud as inputs and efficiently predicts kinematically valid and stable grasps, demonstrating strong adaptability to diverse robot embodiments and object geometries. Extensive experiments conducted in both simulated and real-world environments validate the effectiveness of our approach, with significant improvements in success rate, grasp diversity, and inference speed across multiple robotic hands. Our method achieves an average success rate of 87.53% in simulation in less than one second, tested across three different dexterous robotic hands. In real-world experiments using the LeapHand, the method also demonstrates an average success rate of 89%. Grasp provides a robust solution for dexterous grasping in complex and varied environments. The code, appendix, and videos are available on our project website at https://nus-lins-lab.github.io/drograspweb/.
Paper Structure (41 sections, 11 equations, 12 figures, 6 tables)

This paper contains 41 sections, 11 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: We propose our model that utilizes configuration-invariant pretraining, predicts $\mathcal{D(R,O)}$ representation, and obtains grasps for cross-embodiment from point cloud input.
  • Figure 2: Overview of $\mathcal{D(R, O)}$ framework: We first pretrain the robot encoder with the proposed configuration-invariant pretraining method. Then, we predict the $\mathcal{D(R, O)}$ representation between the robot and object point cloud. Finally, we extract joint values from the $\mathcal{D(R, O)}$ representation.
  • Figure 3: Motivation for configuration-invariant pretraining.
  • Figure 4: Visualization of generated grasps, compared to typical failure cases from existing approaches.
  • Figure 5: Diverse and pose-controllable grasp generation. The arrow refers to the input palm orientation. Arrows and hands of the same color represent corresponding input-output pairs.
  • ...and 7 more figures