Table of Contents
Fetching ...

DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal Human Demonstrations

Zoey Qiuyu Chen, Karl Van Wyk, Yu-Wei Chao, Wei Yang, Arsalan Mousavian, Abhishek Gupta, Dieter Fox

TL;DR

DexTransfer addresses the challenge of teaching a multi-fingered dexterous robot to grasp unseen objects with minimal human demonstrations by converting mocap data into a large, diverse, dynamically feasible dataset. It uses a three-stage pipeline—trajectory retargeting, refinement with correlated sampling, and data augmentation—to train a supervised policy that maps object point clouds and recent state to palm, orientation, and finger actions. The approach achieves robust generalization in simulation and transfers to a real Allegro hand with a KUKA arm, demonstrating reliable grasping across unseen poses. Ablation studies quantify the contributions of domain randomization, augmentation, and data funneling, highlighting the method's practical robustness.

Abstract

Teaching a multi-fingered dexterous robot to grasp objects in the real world has been a challenging problem due to its high dimensional state and action space. We propose a robot-learning system that can take a small number of human demonstrations and learn to grasp unseen object poses given partially occluded observations. Our system leverages a small motion capture dataset and generates a large dataset with diverse and successful trajectories for a multi-fingered robot gripper. By adding domain randomization, we show that our dataset provides robust grasping trajectories that can be transferred to a policy learner. We train a dexterous grasping policy that takes the point clouds of the object as input and predicts continuous actions to grasp objects from different initial robot states. We evaluate the effectiveness of our system on a 22-DoF floating Allegro Hand in simulation and a 23-DoF Allegro robot hand with a KUKA arm in real world. The policy learned from our dataset can generalize well on unseen object poses in both simulation and the real world

DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal Human Demonstrations

TL;DR

DexTransfer addresses the challenge of teaching a multi-fingered dexterous robot to grasp unseen objects with minimal human demonstrations by converting mocap data into a large, diverse, dynamically feasible dataset. It uses a three-stage pipeline—trajectory retargeting, refinement with correlated sampling, and data augmentation—to train a supervised policy that maps object point clouds and recent state to palm, orientation, and finger actions. The approach achieves robust generalization in simulation and transfers to a real Allegro hand with a KUKA arm, demonstrating reliable grasping across unseen poses. Ablation studies quantify the contributions of domain randomization, augmentation, and data funneling, highlighting the method's practical robustness.

Abstract

Teaching a multi-fingered dexterous robot to grasp objects in the real world has been a challenging problem due to its high dimensional state and action space. We propose a robot-learning system that can take a small number of human demonstrations and learn to grasp unseen object poses given partially occluded observations. Our system leverages a small motion capture dataset and generates a large dataset with diverse and successful trajectories for a multi-fingered robot gripper. By adding domain randomization, we show that our dataset provides robust grasping trajectories that can be transferred to a policy learner. We train a dexterous grasping policy that takes the point clouds of the object as input and predicts continuous actions to grasp objects from different initial robot states. We evaluate the effectiveness of our system on a 22-DoF floating Allegro Hand in simulation and a 23-DoF Allegro robot hand with a KUKA arm in real world. The policy learned from our dataset can generalize well on unseen object poses in both simulation and the real world
Paper Structure (23 sections, 3 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 3 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 2: Overview of the proposed framework DexTransfer. The mocap data is first retargeted to a dexterous gripper in simulation. This motion reference is then refined and augmented into a large and diverse set of successful trajectories, to learn a policy to succeed on unseen object poses and initial hand poses. The learned policy is eventually transferred to a real robot system.
  • Figure 3: Network architecture for the point cloud dependent policy $\pi_{\theta}$. Scene Encoder consists of three PointNet++ SA modules followed by two fully-connected layers. Kinematics Encoder consists of three residual modules. Fusion Layer takes the concatenated features and feed into one linear layer followed by two residual modules. The network has three branches to predict palm translation, rotation and joint angles. Each branch consists of three residual modules.
  • Figure 4: Qualitative results of policies on unseen poses of various objects in both simulation and real world. We can see that the hand approaches the objects from a variety of approach angles and grasp poses and can show interesting grasping strategies.
  • Figure 5: Real experiments with a 23-DoF Kuka Allegro robot tested on 5 objects. Each object is evaluated 25 times.
  • Figure 6: Refined successful trajectories from human demonstrations to diverse robot grippers
  • ...and 4 more figures