One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation
Zhenyu Wei, Yunchao Yao, Mingyu Ding
TL;DR
The paper tackles the fragmentation of dexterous manipulation policies by introducing a canonical hand representation: a parameter space and a canonical URDF that unify diverse hand morphologies. It demonstrates a structured morphology latent space via a VAE, preserves kinematic fidelity in in-hand tasks, and enables a single cross-embodiment grasping policy that generalizes to unseen hands, including LEAP variants, with effective sim-to-real transfer. The unified 22-DoF action space and bidirectional joint mappings facilitate cross-embodiment training and zero-shot generalization, offering a scalable foundation for universal dexterous manipulation. This framework paves the way for morphology-aware learning across heterogeneous hardware and holds potential for extending to humanoid and broader manipulation scenarios.
Abstract
Dexterous manipulation policies today largely assume fixed hand designs, severely restricting their generalization to new embodiments with varied kinematic and structural layouts. To overcome this limitation, we introduce a parameterized canonical representation that unifies a broad spectrum of dexterous hand architectures. It comprises a unified parameter space and a canonical URDF format, offering three key advantages. 1) The parameter space captures essential morphological and kinematic variations for effective conditioning in learning algorithms. 2) A structured latent manifold can be learned over our space, where interpolations between embodiments yield smooth and physically meaningful morphology transitions. 3) The canonical URDF standardizes the action space while preserving dynamic and functional properties of the original URDFs, enabling efficient and reliable cross-embodiment policy learning. We validate these advantages through extensive analysis and experiments, including grasp policy replay, VAE latent encoding, and cross-embodiment zero-shot transfer. Specifically, we train a VAE on the unified representation to obtain a compact, semantically rich latent embedding, and develop a grasping policy conditioned on the canonical representation that generalizes across dexterous hands. We demonstrate, through simulation and real-world tasks on unseen morphologies (e.g., 81.9% zero-shot success rate on 3-finger LEAP Hand), that our framework unifies both the representational and action spaces of structurally diverse hands, providing a scalable foundation for cross-hand learning toward universal dexterous manipulation.
