Table of Contents
Fetching ...

DexH2R: Task-oriented Dexterous Manipulation from Human to Robots

Shuqi Zhao, Xinghao Zhu, Yuxin Chen, Chenran Li, Lichen Xie, Xiang Zhang, Mingyu Ding, Masayoshi Tomizuka

TL;DR

DexH2R tackles dexterous manipulation by bridging the gap between human demonstrations and robotic hands without real-time human intervention. It introduces a two-stage pipeline that first generates primitive actions through kinematic retargeting of augmented human trajectories and then refines them with a task-oriented residual policy trained by goal-conditioned RL using environment and object cues. The method employs a MANO-based data augmentation, an optimization-driven retargeting step, and a two-stage reward that guides hand approach and object-trajectory tracking, achieving superior performance across three embodiments and both seen and unseen objects. The findings show strong generalization, sim-to-real viability, and significant efficiency gains, indicating DexH2R as a scalable approach for autonomous dexterous manipulation with reduced human labor.

Abstract

Dexterous manipulation is a critical aspect of human capability, enabling interaction with a wide variety of objects. Recent advancements in learning from human demonstrations and teleoperation have enabled progress for robots in such ability. However, these approaches either require complex data collection such as costly human effort for eye-robot contact, or suffer from poor generalization when faced with novel scenarios. To solve both challenges, we propose a framework, DexH2R, that combines human hand motion retargeting with a task-oriented residual action policy, improving task performance by bridging the embodiment gap between human and robotic dexterous hands. Specifically, DexH2R learns the residual policy directly from retargeted primitive actions and task-oriented rewards, eliminating the need for labor-intensive teleoperation systems. Moreover, we incorporate test-time guidance for novel scenarios by taking in desired trajectories of human hands and objects, allowing the dexterous hand to acquire new skills with high generalizability. Extensive experiments in both simulation and real-world environments demonstrate the effectiveness of our work, outperforming prior state-of-the-arts by 40% across various settings.

DexH2R: Task-oriented Dexterous Manipulation from Human to Robots

TL;DR

DexH2R tackles dexterous manipulation by bridging the gap between human demonstrations and robotic hands without real-time human intervention. It introduces a two-stage pipeline that first generates primitive actions through kinematic retargeting of augmented human trajectories and then refines them with a task-oriented residual policy trained by goal-conditioned RL using environment and object cues. The method employs a MANO-based data augmentation, an optimization-driven retargeting step, and a two-stage reward that guides hand approach and object-trajectory tracking, achieving superior performance across three embodiments and both seen and unseen objects. The findings show strong generalization, sim-to-real viability, and significant efficiency gains, indicating DexH2R as a scalable approach for autonomous dexterous manipulation with reduced human labor.

Abstract

Dexterous manipulation is a critical aspect of human capability, enabling interaction with a wide variety of objects. Recent advancements in learning from human demonstrations and teleoperation have enabled progress for robots in such ability. However, these approaches either require complex data collection such as costly human effort for eye-robot contact, or suffer from poor generalization when faced with novel scenarios. To solve both challenges, we propose a framework, DexH2R, that combines human hand motion retargeting with a task-oriented residual action policy, improving task performance by bridging the embodiment gap between human and robotic dexterous hands. Specifically, DexH2R learns the residual policy directly from retargeted primitive actions and task-oriented rewards, eliminating the need for labor-intensive teleoperation systems. Moreover, we incorporate test-time guidance for novel scenarios by taking in desired trajectories of human hands and objects, allowing the dexterous hand to acquire new skills with high generalizability. Extensive experiments in both simulation and real-world environments demonstrate the effectiveness of our work, outperforming prior state-of-the-arts by 40% across various settings.

Paper Structure

This paper contains 12 sections, 5 equations, 8 figures, 13 tables.

Figures (8)

  • Figure 1: Unlike traditional methods with human feedbackgao2022efficientsi2024tildeqin2022one (Fig. a), DexH2R operates without real-time human intervention ((Fig. b)), significantly reducing human effort, ensuring smooth operation and outperforming baselines by approximately 40% on different robot hands (Fig. c)).
  • Figure 2: Overview of DexH2R framework. After extracting and augmenting human hand and object information from demonstrations, we obtain a large amount of trajectories distributed over all workspace. We perform position kinematics retargeting to acquire primitive actions $\boldsymbol{a}_t^p$. Afterwards, taking in both human hand and object trajectories, we learn a residual action $\boldsymbol{a}_t^r$ to equip our method with task completion information. The primitive and residual actions are combined together as the final actions.
  • Figure 3: Illustration of keypoints $\{\boldsymbol{o}_{p,t}^{i}\}^6_{i=1}$ that are used to define object poses.
  • Figure 4: Simulation results showing that DexH2R can generalize to both seen and unseen objects even for those with long-tailed and rare shapes in three different embodiments.
  • Figure 5: Data distribution of original and augmented dataset.
  • ...and 3 more figures