Table of Contents
Fetching ...

DexRepNet++: Learning Dexterous Robotic Manipulation with Geometric and Spatial Hand-Object Representations

Qingtao Liu, Zhengnan Sun, Yu Cui, Haoming Li, Gaofeng Li, Lin Shao, Jiming Chen, Qi Ye

Abstract

Robotic dexterous manipulation is a challenging problem due to high degrees of freedom (DoFs) and complex contacts of multi-fingered robotic hands. Many existing deep reinforcement learning (DRL) based methods aim at improving sample efficiency in high-dimensional output action spaces. However, existing works often overlook the role of representations in achieving generalization of a manipulation policy in the complex input space during the hand-object interaction. In this paper, we propose DexRep, a novel hand-object interaction representation to capture object surface features and spatial relations between hands and objects for dexterous manipulation skill learning. Based on DexRep, policies are learned for three dexterous manipulation tasks, i.e. grasping, in-hand reorientation, bimanual handover, and extensive experiments are conducted to verify the effectiveness. In simulation, for grasping, the policy learned with 40 objects achieves a success rate of 87.9% on more than 5000 unseen objects of diverse categories, significantly surpassing existing work trained with thousands of objects; for the in-hand reorientation and handover tasks, the policies also boost the success rates and other metrics of existing hand-object representations by 20% to 40%. The grasp policies with DexRep are deployed to the real world under multi-camera and single-camera setups and demonstrate a small sim-to-real gap.

DexRepNet++: Learning Dexterous Robotic Manipulation with Geometric and Spatial Hand-Object Representations

Abstract

Robotic dexterous manipulation is a challenging problem due to high degrees of freedom (DoFs) and complex contacts of multi-fingered robotic hands. Many existing deep reinforcement learning (DRL) based methods aim at improving sample efficiency in high-dimensional output action spaces. However, existing works often overlook the role of representations in achieving generalization of a manipulation policy in the complex input space during the hand-object interaction. In this paper, we propose DexRep, a novel hand-object interaction representation to capture object surface features and spatial relations between hands and objects for dexterous manipulation skill learning. Based on DexRep, policies are learned for three dexterous manipulation tasks, i.e. grasping, in-hand reorientation, bimanual handover, and extensive experiments are conducted to verify the effectiveness. In simulation, for grasping, the policy learned with 40 objects achieves a success rate of 87.9% on more than 5000 unseen objects of diverse categories, significantly surpassing existing work trained with thousands of objects; for the in-hand reorientation and handover tasks, the policies also boost the success rates and other metrics of existing hand-object representations by 20% to 40%. The grasp policies with DexRep are deployed to the real world under multi-camera and single-camera setups and demonstrate a small sim-to-real gap.
Paper Structure (56 sections, 25 equations, 18 figures, 9 tables)

This paper contains 56 sections, 25 equations, 18 figures, 9 tables.

Figures (18)

  • Figure 1: Policies learned with our hand-object representation perform grasping, in-hand reorientation, and handover tasks with five-fingered dexterous robotic hands.
  • Figure 2: DexRep and its integration into dexterous manipulation learning.Left: Visualization of the three components of DexRep—Occupancy, Surface, and Local-Geo features—each encoding a different aspect of the hand-object interaction. Right: Policy learning framework with DexRep as input. The dashed box denotes the representation for the second hand in bimanual settings, which is omitted in single-hand scenarios.
  • Figure 3: Demonstration acquisition for behavior cloning. In the grasping task, we obtain human demonstration data $\mathcal{D}_{\text{human}}$ from the GRAB dataset taheri2020grab and retarget it to Adroit hand kumar2013adroithand to generate robot demonstration data $\mathcal{D}$ for subsequent BC initialization of the policy $\pi_\theta$.
  • Figure 4: Our pipeline to learn manipulation policy with DexRep. For the grasping task, we first pre-train the policy using BC and then fine-tune it through RL. For in-hand reorientation and handover tasks, we start with a randomly initialized policy and learn the strategy from scratch using RL.
  • Figure 5: Retargeting process. Key vectors—finger-to-finger, finger-to-wrist, and finger-to-object—are computed from both human (MANO) and robotic (Adroit) hands and optimized to align hand postures.
  • ...and 13 more figures