Table of Contents
Fetching ...

Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations

Bohan Zhou, Haoqi Yuan, Yuhui Fu, Zongqing Lu

TL;DR

This work tackles learning diverse bimanual dexterous manipulation by leveraging abundant human demonstrations to auto-construct tasks and train multi-task policies.BiDexHD employs a two-stage reward-based teacher learning framework and distills into a vision-based student that operates on point clouds, enabling scalable deployment.On the TACO dataset, BiDexHD achieves 74.59% task fulfillment on trained tasks and 51.07% on unseen tasks, indicating strong learning and competitive zero-shot generalization.By eliminating reliance on hand-crafted tasks and per-task rewards, the framework advances toward universal bimanual dexterous manipulation.

Abstract

Bimanual dexterous manipulation is a critical yet underexplored area in robotics. Its high-dimensional action space and inherent task complexity present significant challenges for policy learning, and the limited task diversity in existing benchmarks hinders general-purpose skill development. Existing approaches largely depend on reinforcement learning, often constrained by intricately designed reward functions tailored to a narrow set of tasks. In this work, we present a novel approach for efficiently learning diverse bimanual dexterous skills from abundant human demonstrations. Specifically, we introduce BiDexHD, a framework that unifies task construction from existing bimanual datasets and employs teacher-student policy learning to address all tasks. The teacher learns state-based policies using a general two-stage reward function across tasks with shared behaviors, while the student distills the learned multi-task policies into a vision-based policy. With BiDexHD, scalable learning of numerous bimanual dexterous skills from auto-constructed tasks becomes feasible, offering promising advances toward universal bimanual dexterous manipulation. Our empirical evaluation on the TACO dataset, spanning 141 tasks across six categories, demonstrates a task fulfillment rate of 74.59% on trained tasks and 51.07% on unseen tasks, showcasing the effectiveness and competitive zero-shot generalization capabilities of BiDexHD. For videos and more information, visit our project page https://sites.google.com/view/bidexhd.

Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations

TL;DR

This work tackles learning diverse bimanual dexterous manipulation by leveraging abundant human demonstrations to auto-construct tasks and train multi-task policies.BiDexHD employs a two-stage reward-based teacher learning framework and distills into a vision-based student that operates on point clouds, enabling scalable deployment.On the TACO dataset, BiDexHD achieves 74.59% task fulfillment on trained tasks and 51.07% on unseen tasks, indicating strong learning and competitive zero-shot generalization.By eliminating reliance on hand-crafted tasks and per-task rewards, the framework advances toward universal bimanual dexterous manipulation.

Abstract

Bimanual dexterous manipulation is a critical yet underexplored area in robotics. Its high-dimensional action space and inherent task complexity present significant challenges for policy learning, and the limited task diversity in existing benchmarks hinders general-purpose skill development. Existing approaches largely depend on reinforcement learning, often constrained by intricately designed reward functions tailored to a narrow set of tasks. In this work, we present a novel approach for efficiently learning diverse bimanual dexterous skills from abundant human demonstrations. Specifically, we introduce BiDexHD, a framework that unifies task construction from existing bimanual datasets and employs teacher-student policy learning to address all tasks. The teacher learns state-based policies using a general two-stage reward function across tasks with shared behaviors, while the student distills the learned multi-task policies into a vision-based policy. With BiDexHD, scalable learning of numerous bimanual dexterous skills from auto-constructed tasks becomes feasible, offering promising advances toward universal bimanual dexterous manipulation. Our empirical evaluation on the TACO dataset, spanning 141 tasks across six categories, demonstrates a task fulfillment rate of 74.59% on trained tasks and 51.07% on unseen tasks, showcasing the effectiveness and competitive zero-shot generalization capabilities of BiDexHD. For videos and more information, visit our project page https://sites.google.com/view/bidexhd.
Paper Structure (31 sections, 8 equations, 5 figures, 18 tables, 1 algorithm)

This paper contains 31 sections, 8 equations, 5 figures, 18 tables, 1 algorithm.

Figures (5)

  • Figure 1: The three-phase framework, BiDexHD, unifies constructing and solving tasks from human bimanual datasets instead of existing benchmarks. In phase one, BiDexHD constructs each bimanual task from a human demonstration. In phase two, BiDexHD learns diverse state-based policies from a generally designed two-stage reward function via multi-task reinforcement learning. A group of learned policies are then distilled into a vision-based policy for inference in phase three.
  • Figure 2: General two-stage teacher learning. For each task $\mathcal{T}^i$, all joint poses are initialized at zero pose and a pair of tool-object are initialized at a fixed pose at stage zero. At stage one, approaching reward $r_\text{appro}$ encourages both hands to get close to their grasping centers $\hat{\mathbf{x}}_{\text{gc}}$, and lifting reward $r_\text{lift}$ along with extra bonus $r_\text{bonus}$ incentivizes moving both objects to thier reference poses respectively. After simulation alignment, dual hands will manipulate objects under the guidance of tracking reward $r_\text{track}$.
  • Figure 3: A comparison of grasping pose during policy deployment between BiDexHD-IPPO (w/o gc) and BiDexHD-IPPO.
  • Figure 4: Task visualization of (pour in some, cup, teapot).
  • Figure 5: Task visualization of (empty, bowl, bowl).