Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations

Bohan Zhou; Haoqi Yuan; Yuhui Fu; Zongqing Lu

Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations

Bohan Zhou, Haoqi Yuan, Yuhui Fu, Zongqing Lu

TL;DR

This work tackles learning diverse bimanual dexterous manipulation by leveraging abundant human demonstrations to auto-construct tasks and train multi-task policies.BiDexHD employs a two-stage reward-based teacher learning framework and distills into a vision-based student that operates on point clouds, enabling scalable deployment.On the TACO dataset, BiDexHD achieves 74.59% task fulfillment on trained tasks and 51.07% on unseen tasks, indicating strong learning and competitive zero-shot generalization.By eliminating reliance on hand-crafted tasks and per-task rewards, the framework advances toward universal bimanual dexterous manipulation.

Abstract

Bimanual dexterous manipulation is a critical yet underexplored area in robotics. Its high-dimensional action space and inherent task complexity present significant challenges for policy learning, and the limited task diversity in existing benchmarks hinders general-purpose skill development. Existing approaches largely depend on reinforcement learning, often constrained by intricately designed reward functions tailored to a narrow set of tasks. In this work, we present a novel approach for efficiently learning diverse bimanual dexterous skills from abundant human demonstrations. Specifically, we introduce BiDexHD, a framework that unifies task construction from existing bimanual datasets and employs teacher-student policy learning to address all tasks. The teacher learns state-based policies using a general two-stage reward function across tasks with shared behaviors, while the student distills the learned multi-task policies into a vision-based policy. With BiDexHD, scalable learning of numerous bimanual dexterous skills from auto-constructed tasks becomes feasible, offering promising advances toward universal bimanual dexterous manipulation. Our empirical evaluation on the TACO dataset, spanning 141 tasks across six categories, demonstrates a task fulfillment rate of 74.59% on trained tasks and 51.07% on unseen tasks, showcasing the effectiveness and competitive zero-shot generalization capabilities of BiDexHD. For videos and more information, visit our project page https://sites.google.com/view/bidexhd.

Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations

TL;DR

Abstract

Paper Structure (31 sections, 8 equations, 5 figures, 18 tables, 1 algorithm)

This paper contains 31 sections, 8 equations, 5 figures, 18 tables, 1 algorithm.

Introduction
Related Work
Bimanual Dexterous Manipulation
Learning Dexterity From Human Demonstrations
Preliminaries
Task Formulation
Teacher-Student Learning
Learning Bimanual Dexterity From Human Demonstrations
Overview
Task Construction From Bimanual Dataset
Multi-Task State-Based Policy Learning
Vision-Based Policy distillation
experiments
Setups
Teacher Learning
...and 16 more sections

Figures (5)

Figure 1: The three-phase framework, BiDexHD, unifies constructing and solving tasks from human bimanual datasets instead of existing benchmarks. In phase one, BiDexHD constructs each bimanual task from a human demonstration. In phase two, BiDexHD learns diverse state-based policies from a generally designed two-stage reward function via multi-task reinforcement learning. A group of learned policies are then distilled into a vision-based policy for inference in phase three.
Figure 2: General two-stage teacher learning. For each task $\mathcal{T}^i$, all joint poses are initialized at zero pose and a pair of tool-object are initialized at a fixed pose at stage zero. At stage one, approaching reward $r_\text{appro}$ encourages both hands to get close to their grasping centers $\hat{\mathbf{x}}_{\text{gc}}$, and lifting reward $r_\text{lift}$ along with extra bonus $r_\text{bonus}$ incentivizes moving both objects to thier reference poses respectively. After simulation alignment, dual hands will manipulate objects under the guidance of tracking reward $r_\text{track}$.
Figure 3: A comparison of grasping pose during policy deployment between BiDexHD-IPPO (w/o gc) and BiDexHD-IPPO.
Figure 4: Task visualization of (pour in some, cup, teapot).
Figure 5: Task visualization of (empty, bowl, bowl).

Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations

TL;DR

Abstract

Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations

Authors

TL;DR

Abstract

Table of Contents

Figures (5)