Table of Contents
Fetching ...

Dexterous Functional Pre-Grasp Manipulation with Diffusion Policy

Tianhao Wu, Yunchong Gan, Mingdong Wu, Jingbo Cheng, Yaodong Yang, Yixin Zhu, Hao Dong

TL;DR

This work tackles dexterous functional pre-grasp manipulation, where objects must be repositioned and reoriented to achieve functional grasp poses. It introduces a teacher-student framework that uses a novel mutual reward, a mixture-of-experts policy, and a diffusion policy to model complex, high-DOF manipulation and generalize across diverse objects and goal poses. Through offline imitation learning from multiple experts, the diffusion-based student can achieve teacher-level performance and robustly leverage extrinsic dexterity, reporting 72.6% success across 30+ object categories. The approach advances generalizable pre-grasp manipulation with practical potential for real-world functional grasping, while noting challenges with irregular geometries and sim-to-real transfer.

Abstract

In real-world scenarios, objects often require repositioning and reorientation before they can be grasped, a process known as pre-grasp manipulation. Learning universal dexterous functional pre-grasp manipulation requires precise control over the relative position, orientation, and contact between the hand and object while generalizing to diverse dynamic scenarios with varying objects and goal poses. To address this challenge, we propose a teacher-student learning approach that utilizes a novel mutual reward, incentivizing agents to optimize three key criteria jointly. Additionally, we introduce a pipeline that employs a mixture-of-experts strategy to learn diverse manipulation policies, followed by a diffusion policy to capture complex action distributions from these experts. Our method achieves a success rate of 72.6\% across more than 30 object categories by leveraging extrinsic dexterity and adjusting from feedback.

Dexterous Functional Pre-Grasp Manipulation with Diffusion Policy

TL;DR

This work tackles dexterous functional pre-grasp manipulation, where objects must be repositioned and reoriented to achieve functional grasp poses. It introduces a teacher-student framework that uses a novel mutual reward, a mixture-of-experts policy, and a diffusion policy to model complex, high-DOF manipulation and generalize across diverse objects and goal poses. Through offline imitation learning from multiple experts, the diffusion-based student can achieve teacher-level performance and robustly leverage extrinsic dexterity, reporting 72.6% success across 30+ object categories. The approach advances generalizable pre-grasp manipulation with practical potential for real-world functional grasping, while noting challenges with irregular geometries and sim-to-real transfer.

Abstract

In real-world scenarios, objects often require repositioning and reorientation before they can be grasped, a process known as pre-grasp manipulation. Learning universal dexterous functional pre-grasp manipulation requires precise control over the relative position, orientation, and contact between the hand and object while generalizing to diverse dynamic scenarios with varying objects and goal poses. To address this challenge, we propose a teacher-student learning approach that utilizes a novel mutual reward, incentivizing agents to optimize three key criteria jointly. Additionally, we introduce a pipeline that employs a mixture-of-experts strategy to learn diverse manipulation policies, followed by a diffusion policy to capture complex action distributions from these experts. Our method achieves a success rate of 72.6\% across more than 30 object categories by leveraging extrinsic dexterity and adjusting from feedback.
Paper Structure (23 sections, 7 equations, 4 figures, 6 tables)

This paper contains 23 sections, 7 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Our closed-loop manipulation policy continuously repositions and reorients diverse objects to match the functional grasp goal poses successfully. (a) The dexterous functional pre-grasp manipulation. (b) The functional grasp goal poses.
  • Figure 2: Pipeline. (a) An Autoencoder learns latent representations based on the object-hand point cloud. (b) K-Means clusters the training set into N clusters based on the learned representations. (c) Learning an expert for each cluster based on mutual reward. (d) Distilling multi-expert knowledge into a single student using diffusion for dexterous functional pre-grasp manipulation of seen and unseen objects.
  • Figure 3: Adaptability of Our Learned Policy. Although our agent may initially fail to manipulate objects, it adjusts its policy on the second attempt, successfully manipulating them. This capability helps the agent handle diverse dynamics.
  • Figure 4: Success Rate of Different Object Categories.