$\mathcal{D(R,O)}$ Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

Zhenyu Wei; Zhixuan Xu; Jingxiang Guo; Yiwen Hou; Chongkai Gao; Zhehao Cai; Jiayu Luo; Lin Shao

$\mathcal{D(R,O)}$ Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

Zhenyu Wei, Zhixuan Xu, Jingxiang Guo, Yiwen Hou, Chongkai Gao, Zhehao Cai, Jiayu Luo, Lin Shao

TL;DR

The paper tackles cross-embodiment dexterous grasping by proposing an interaction-centric representation, $\mathcal{D(R,O)}$, and a configuration-invariant pretraining strategy. It jointly learns a $\mathcal{D(R,O)}$ predictor via a CVAE with cross-attention and recovers 6D link poses to derive joint configurations through efficient optimization, enabling fast, stable grasps across multiple hands and objects. In simulation and on real hardware, the method achieves high success rates (approximately $87$–$90\%$ in simulation and $89\%$ in the real world) and demonstrates robust performance with partial observations and zero-shot generalization to novel hands. This approach outperforms existing baselines and offers practical impact for versatile, real-time dexterous manipulation.

Abstract

Dexterous grasping is a fundamental yet challenging skill in robotic manipulation, requiring precise interaction between robotic hands and objects. In this paper, we present $\mathcal{D(R,O)}$ Grasp, a novel framework that models the interaction between the robotic hand in its grasping pose and the object, enabling broad generalization across various robot hands and object geometries. Our model takes the robot hand's description and object point cloud as inputs and efficiently predicts kinematically valid and stable grasps, demonstrating strong adaptability to diverse robot embodiments and object geometries. Extensive experiments conducted in both simulated and real-world environments validate the effectiveness of our approach, with significant improvements in success rate, grasp diversity, and inference speed across multiple robotic hands. Our method achieves an average success rate of 87.53% in simulation in less than one second, tested across three different dexterous robotic hands. In real-world experiments using the LeapHand, the method also demonstrates an average success rate of 89%. $\mathcal{D(R,O)}$ Grasp provides a robust solution for dexterous grasping in complex and varied environments. The code, appendix, and videos are available on our project website at https://nus-lins-lab.github.io/drograspweb/.

$\mathcal{D(R,O)}$ Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

TL;DR

The paper tackles cross-embodiment dexterous grasping by proposing an interaction-centric representation,

, and a configuration-invariant pretraining strategy. It jointly learns a

predictor via a CVAE with cross-attention and recovers 6D link poses to derive joint configurations through efficient optimization, enabling fast, stable grasps across multiple hands and objects. In simulation and on real hardware, the method achieves high success rates (approximately

–

in simulation and

in the real world) and demonstrates robust performance with partial observations and zero-shot generalization to novel hands. This approach outperforms existing baselines and offers practical impact for versatile, real-time dexterous manipulation.

Abstract

Dexterous grasping is a fundamental yet challenging skill in robotic manipulation, requiring precise interaction between robotic hands and objects. In this paper, we present

Grasp, a novel framework that models the interaction between the robotic hand in its grasping pose and the object, enabling broad generalization across various robot hands and object geometries. Our model takes the robot hand's description and object point cloud as inputs and efficiently predicts kinematically valid and stable grasps, demonstrating strong adaptability to diverse robot embodiments and object geometries. Extensive experiments conducted in both simulated and real-world environments validate the effectiveness of our approach, with significant improvements in success rate, grasp diversity, and inference speed across multiple robotic hands. Our method achieves an average success rate of 87.53% in simulation in less than one second, tested across three different dexterous robotic hands. In real-world experiments using the LeapHand, the method also demonstrates an average success rate of 89%.

Grasp provides a robust solution for dexterous grasping in complex and varied environments. The code, appendix, and videos are available on our project website at https://nus-lins-lab.github.io/drograspweb/.

Paper Structure (41 sections, 11 equations, 12 figures, 6 tables)

This paper contains 41 sections, 11 equations, 12 figures, 6 tables.

Introduction
Related Work
Learning-based Robotic Dexterous Grasping
Learning Robotic Hand Features
Method
Configuration-Invariant Pretraining
D(R, O) Prediction
Grasp Configuration Generation from D(R, O)
Loss Function
Experiments
Evaluation Metric
Dataset
Overall Performance
Diverse Grasp Synthesis
Configuration Correspondence Learning
...and 26 more sections

Figures (12)

Figure 1: We propose our model that utilizes configuration-invariant pretraining, predicts $\mathcal{D(R,O)}$ representation, and obtains grasps for cross-embodiment from point cloud input.
Figure 2: Overview of $\mathcal{D(R, O)}$ framework: We first pretrain the robot encoder with the proposed configuration-invariant pretraining method. Then, we predict the $\mathcal{D(R, O)}$ representation between the robot and object point cloud. Finally, we extract joint values from the $\mathcal{D(R, O)}$ representation.
Figure 3: Motivation for configuration-invariant pretraining.
Figure 4: Visualization of generated grasps, compared to typical failure cases from existing approaches.
Figure 5: Diverse and pose-controllable grasp generation. The arrow refers to the input palm orientation. Arrows and hands of the same color represent corresponding input-output pairs.
...and 7 more figures

$\mathcal{D(R,O)}$ Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

TL;DR

Abstract

$\mathcal{D(R,O)}$ Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

Authors

TL;DR

Abstract

Table of Contents

Figures (12)