Table of Contents
Fetching ...

Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning

Heng Zhang, Kevin Yuchen Ma, Mike Zheng Shou, Weisi Lin, Yan Wu

TL;DR

This work tackles cross-embodiment dexterous grasp generation by introducing an eigengrasp-based end-to-end framework that derives a morphology embedding and hand-specific eigengrasps from a URDF. An amplitude predictor, conditioned on object geometry and wrist pose, outputs coefficients to reconstruct full joint articulations, supervised by a Kinematic-Aware Articulation Loss that emphasizes fingertip-relevant motions. The approach is evaluated across three hands in simulation and on a real robot, achieving a 91.9% average success on unseen objects with fast inference, and demonstrating strong few-shot generalization to unseen hands and successful real-world transfer. These results demonstrate scalable cross-embodiment grasp generation without hand-specific retraining, enabling practical dexterous manipulation across diverse robotic morphologies.

Abstract

Dexterous grasping with multi-fingered hands remains challenging due to high-dimensional articulations and the cost of optimization-based pipelines. Existing end-to-end methods require training on large-scale datasets for specific hands, limiting their ability to generalize across different embodiments. We propose an eigengrasp-based, end-to-end framework for cross-embodiment grasp generation. From a hand's morphology description, we derive a morphology embedding and an eigengrasp set. Conditioned on these, together with the object point cloud and wrist pose, an amplitude predictor regresses articulation coefficients in a low-dimensional space, which are decoded into full joint articulations. Articulation learning is supervised with a Kinematic-Aware Articulation Loss (KAL) that emphasizes fingertip-relevant motions and injects morphology-specific structure. In simulation on unseen objects across three dexterous hands, our model attains a 91.9% average grasp success rate with less than 0.4 seconds inference per grasp. With few-shot adaptation to an unseen hand, it achieves 85.6% success on unseen objects in simulation, and real-world experiments on this few-shot generalized hand achieve an 87% success rate. The code and additional materials will be made available upon publication on our project website https://connor-zh.github.io/cross_embodiment_dexterous_grasping.

Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning

TL;DR

This work tackles cross-embodiment dexterous grasp generation by introducing an eigengrasp-based end-to-end framework that derives a morphology embedding and hand-specific eigengrasps from a URDF. An amplitude predictor, conditioned on object geometry and wrist pose, outputs coefficients to reconstruct full joint articulations, supervised by a Kinematic-Aware Articulation Loss that emphasizes fingertip-relevant motions. The approach is evaluated across three hands in simulation and on a real robot, achieving a 91.9% average success on unseen objects with fast inference, and demonstrating strong few-shot generalization to unseen hands and successful real-world transfer. These results demonstrate scalable cross-embodiment grasp generation without hand-specific retraining, enabling practical dexterous manipulation across diverse robotic morphologies.

Abstract

Dexterous grasping with multi-fingered hands remains challenging due to high-dimensional articulations and the cost of optimization-based pipelines. Existing end-to-end methods require training on large-scale datasets for specific hands, limiting their ability to generalize across different embodiments. We propose an eigengrasp-based, end-to-end framework for cross-embodiment grasp generation. From a hand's morphology description, we derive a morphology embedding and an eigengrasp set. Conditioned on these, together with the object point cloud and wrist pose, an amplitude predictor regresses articulation coefficients in a low-dimensional space, which are decoded into full joint articulations. Articulation learning is supervised with a Kinematic-Aware Articulation Loss (KAL) that emphasizes fingertip-relevant motions and injects morphology-specific structure. In simulation on unseen objects across three dexterous hands, our model attains a 91.9% average grasp success rate with less than 0.4 seconds inference per grasp. With few-shot adaptation to an unseen hand, it achieves 85.6% success on unseen objects in simulation, and real-world experiments on this few-shot generalized hand achieve an 87% success rate. The code and additional materials will be made available upon publication on our project website https://connor-zh.github.io/cross_embodiment_dexterous_grasping.

Paper Structure

This paper contains 20 sections, 14 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Visualization of generated grasps on test unseen objects. Left: ShadowHand, Middle: Allegro, Right: Barrett
  • Figure 2: Method Overview. Our framework processes hand URDF, object point cloud, and wrist pose through specialized encoders to generate morphology embedding $\boldsymbol{m}$, point cloud embedding $\boldsymbol{p}$, and pose encoding $\boldsymbol{h}$. These are fused to predict eigengrasp amplitudes $a$, which combined with eigengrasps $E$ to produce the final articulation vector $\boldsymbol{q}$.
  • Figure 3: Architecture of Morphology Encoder: The joint encodings are mapped into tokens and then processed by the EmbodimentTransformer. Relevant output tokens corresponding to revolute joints are concatenated and used by the morphology head and eigengrasp heads to produce the morphology embedding and eigengrasps respectively.
  • Figure 4: Real world experiment test objects and example predicted grasps.
  • Figure 5: Comparison of grasping strategies on the Barrett hand. Left: a form-closure grasp favored by DRO, where the wrist pose is close to the object and fingers wrap around it. Right: a force-closure grasp favored by our method with KAL, where fingertip contacts dominate.