Table of Contents
Fetching ...

One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation

Zhenyu Wei, Yunchao Yao, Mingyu Ding

TL;DR

The paper tackles the fragmentation of dexterous manipulation policies by introducing a canonical hand representation: a parameter space and a canonical URDF that unify diverse hand morphologies. It demonstrates a structured morphology latent space via a VAE, preserves kinematic fidelity in in-hand tasks, and enables a single cross-embodiment grasping policy that generalizes to unseen hands, including LEAP variants, with effective sim-to-real transfer. The unified 22-DoF action space and bidirectional joint mappings facilitate cross-embodiment training and zero-shot generalization, offering a scalable foundation for universal dexterous manipulation. This framework paves the way for morphology-aware learning across heterogeneous hardware and holds potential for extending to humanoid and broader manipulation scenarios.

Abstract

Dexterous manipulation policies today largely assume fixed hand designs, severely restricting their generalization to new embodiments with varied kinematic and structural layouts. To overcome this limitation, we introduce a parameterized canonical representation that unifies a broad spectrum of dexterous hand architectures. It comprises a unified parameter space and a canonical URDF format, offering three key advantages. 1) The parameter space captures essential morphological and kinematic variations for effective conditioning in learning algorithms. 2) A structured latent manifold can be learned over our space, where interpolations between embodiments yield smooth and physically meaningful morphology transitions. 3) The canonical URDF standardizes the action space while preserving dynamic and functional properties of the original URDFs, enabling efficient and reliable cross-embodiment policy learning. We validate these advantages through extensive analysis and experiments, including grasp policy replay, VAE latent encoding, and cross-embodiment zero-shot transfer. Specifically, we train a VAE on the unified representation to obtain a compact, semantically rich latent embedding, and develop a grasping policy conditioned on the canonical representation that generalizes across dexterous hands. We demonstrate, through simulation and real-world tasks on unseen morphologies (e.g., 81.9% zero-shot success rate on 3-finger LEAP Hand), that our framework unifies both the representational and action spaces of structurally diverse hands, providing a scalable foundation for cross-hand learning toward universal dexterous manipulation.

One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation

TL;DR

The paper tackles the fragmentation of dexterous manipulation policies by introducing a canonical hand representation: a parameter space and a canonical URDF that unify diverse hand morphologies. It demonstrates a structured morphology latent space via a VAE, preserves kinematic fidelity in in-hand tasks, and enables a single cross-embodiment grasping policy that generalizes to unseen hands, including LEAP variants, with effective sim-to-real transfer. The unified 22-DoF action space and bidirectional joint mappings facilitate cross-embodiment training and zero-shot generalization, offering a scalable foundation for universal dexterous manipulation. This framework paves the way for morphology-aware learning across heterogeneous hardware and holds potential for extending to humanoid and broader manipulation scenarios.

Abstract

Dexterous manipulation policies today largely assume fixed hand designs, severely restricting their generalization to new embodiments with varied kinematic and structural layouts. To overcome this limitation, we introduce a parameterized canonical representation that unifies a broad spectrum of dexterous hand architectures. It comprises a unified parameter space and a canonical URDF format, offering three key advantages. 1) The parameter space captures essential morphological and kinematic variations for effective conditioning in learning algorithms. 2) A structured latent manifold can be learned over our space, where interpolations between embodiments yield smooth and physically meaningful morphology transitions. 3) The canonical URDF standardizes the action space while preserving dynamic and functional properties of the original URDFs, enabling efficient and reliable cross-embodiment policy learning. We validate these advantages through extensive analysis and experiments, including grasp policy replay, VAE latent encoding, and cross-embodiment zero-shot transfer. Specifically, we train a VAE on the unified representation to obtain a compact, semantically rich latent embedding, and develop a grasping policy conditioned on the canonical representation that generalizes across dexterous hands. We demonstrate, through simulation and real-world tasks on unseen morphologies (e.g., 81.9% zero-shot success rate on 3-finger LEAP Hand), that our framework unifies both the representational and action spaces of structurally diverse hands, providing a scalable foundation for cross-hand learning toward universal dexterous manipulation.
Paper Structure (45 sections, 4 equations, 16 figures, 14 tables)

This paper contains 45 sections, 4 equations, 16 figures, 14 tables.

Figures (16)

  • Figure 1: We introduce a canonical hand representation that unifies diverse dexterous hands into a shared parameter space and canonical URDF format, serving as a condition for cross-embodiment policy learning. It enables dexterous grasping and zero-shot generalization to novel hand morphologies, highlighting its potential for a wide range of dexterous manipulation tasks.
  • Figure 2: Comparison of canonical and original URDFs across five dexterous hands with different finger numbers and handedness. For each hand (from left to right): canonical URDF, original URDF, overlay of initial poses, and overlay of grasp poses, showing close morphological and kinematic consistency between the canonical and original models.
  • Figure 3: Structure of the canonical URDF. A right-hand configuration is shown for clarity, but the representation is applicable to both left- and right-handed hands.
  • Figure 4: Coordinate frame inconsistencies in URDFs. (a) Global orientations vary across sources, (b) local joint frames use inconsistent axis definitions, leading to kinematic ambiguity.
  • Figure 5: Visualization of latent-space interpolation between two dexterous hands. Canonical URDFs are shown at the ends, with decoded reconstructions and interpolated morphologies in between, demonstrating smooth transitions in DoF, finger arrangement, and overall geometry.
  • ...and 11 more figures