Table of Contents
Fetching ...

HumDex:Humanoid Dexterous Manipulation Made Easy

Liang Heng, Yihe Tang, Jiajun Xu, Henghui Bao, Di Huang, Yue Wang

Abstract

This paper investigates humanoid whole-body dexterous manipulation, where the efficient collection of high-quality demonstration data remains a central bottleneck. Existing teleoperation systems often suffer from limited portability, occlusion, or insufficient precision, which hinders their applicability to complex whole-body tasks. To address these challenges, we introduce HumDex, a portable teleoperation system designed for humanoid whole-body dexterous manipulation. Our system leverages IMU-based motion tracking to address the portability-precision trade-off, enabling accurate full-body tracking while remaining easy to deploy. For dexterous hand control, we further introduce a learning-based retargeting method that generates smooth and natural hand motions without manual parameter tuning. Beyond teleoperation, HumDex enables efficient collection of human motion data. Building on this capability, we propose a two-stage imitation learning framework that first pre-trains on diverse human motion data to learn generalizable priors, and then fine-tunes on robot data to bridge the embodiment gap for precise execution. We demonstrate that this approach significantly improves generalization to new configurations, objects, and backgrounds with minimal data acquisition costs. The entire system is fully reproducible and open-sourced at https://github.com/physical-superintelligence-lab/HumDex.

HumDex:Humanoid Dexterous Manipulation Made Easy

Abstract

This paper investigates humanoid whole-body dexterous manipulation, where the efficient collection of high-quality demonstration data remains a central bottleneck. Existing teleoperation systems often suffer from limited portability, occlusion, or insufficient precision, which hinders their applicability to complex whole-body tasks. To address these challenges, we introduce HumDex, a portable teleoperation system designed for humanoid whole-body dexterous manipulation. Our system leverages IMU-based motion tracking to address the portability-precision trade-off, enabling accurate full-body tracking while remaining easy to deploy. For dexterous hand control, we further introduce a learning-based retargeting method that generates smooth and natural hand motions without manual parameter tuning. Beyond teleoperation, HumDex enables efficient collection of human motion data. Building on this capability, we propose a two-stage imitation learning framework that first pre-trains on diverse human motion data to learn generalizable priors, and then fine-tunes on robot data to bridge the embodiment gap for precise execution. We demonstrate that this approach significantly improves generalization to new configurations, objects, and backgrounds with minimal data acquisition costs. The entire system is fully reproducible and open-sourced at https://github.com/physical-superintelligence-lab/HumDex.
Paper Structure (34 sections, 3 equations, 7 figures, 4 tables)

This paper contains 34 sections, 3 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The HumDex System. Our portable teleoperation system enables efficient collection of high-quality dexterous manipulation data. Left: We demonstrate data collection and autonomous policy execution on challenging tasks featuring dexterous manipulation, bimanual coordination, long-horizon planning, deformable and articulated object manipulation, and whole-body movement. Middle: We use a Unitree-G1 humanoid and two 20 DoF dexterous hands. Right: By pretraining robot policy on diverse human data, our policy generalizes to new positions, objects, and backgrounds unseen in robot data.
  • Figure 2: System Overview. (A) Our teleoperation pipeline and hand retargeting policy training. (B) Our imitation learning policy architecture. We approximate proprioceptive states missing in human data with previous-frame actions.
  • Figure 3: Evaluation Tasks and Generalization. We visualize the initial state and key steps in our evaluated tasks. In the Task 5 generalization test, robot data used for training only consists of the Seen (position, object, background) setting.
  • Figure 4: Qualitative pose reproduction on the Wuji hand. We compare an optimization-based retargeting baseline and our learning-based retargeter on canonical dexterous poses captured by the inertial glove, including touch middle finger, touch index finger, touch ring finger, and the rock sign.
  • Figure A1: System Hardware Overview.(a) The Unitree G1 Edu+ humanoid robot integrated with custom WUJI dexterous hands. (b) The human operator wearing the VIRDYN inertial motion capture suit and data gloves for immersive teleoperation. (c) Visual feedback is provided by the robot's built-in RealSense camera.
  • ...and 2 more figures