UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

Weikang Wan; Haoran Geng; Yun Liu; Zikang Shan; Yaodong Yang; Li Yi; He Wang

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, He Wang

TL;DR

This work tackles universal dexterous grasping from realistic point clouds by introducing geometry-aware curriculum learning and geometry-aware iterative Generalist-Specialist Learning. The two-stage pipeline first cultivates a geometry-informed state-based generalist and then distills to a vision-based generalist through iterative GiGSL cycles, leveraging GeoClustering to partition tasks by geometry. The approach yields strong generalization across 3000+ objects, achieving state-of-the-art performance on train and test splits and demonstrating improvements over the previous UniDexGrasp framework, with additional validation in Meta-World. The methods aim to bridge sim-to-real gaps by emphasizing geometry and structured distillation, offering practical impact for robust, scalable dexterous manipulation."

Abstract

We propose a novel, object-agnostic method for learning a universal policy for dexterous object grasping from realistic point cloud observations and proprioceptive information under a table-top setting, namely UniDexGrasp++. To address the challenge of learning the vision-based policy across thousands of object instances, we propose Geometry-aware Curriculum Learning (GeoCurriculum) and Geometry-aware iterative Generalist-Specialist Learning (GiGSL) which leverage the geometry feature of the task and significantly improve the generalizability. With our proposed techniques, our final policy shows universal dexterous grasping on thousands of object instances with 85.4% and 78.2% success rate on the train set and test set which outperforms the state-of-the-art baseline UniDexGrasp by 11.7% and 11.3%, respectively.

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

TL;DR

Abstract

Paper Structure (25 sections, 9 equations, 7 figures, 10 tables, 3 algorithms)

This paper contains 25 sections, 9 equations, 7 figures, 10 tables, 3 algorithms.

Introduction
Related Work
Dexterous Grasping
Vision-based Policy Learning
Generalization in Imitation Learning and Policy Distillation
Problem Formulation
Method
Method Overview
iGSL: iterative Generalist-Specialist Learning
GiGSL: Geometry-aware iterative Generalist-Specialist Learning
GeoCurriculum: Geometry-aware Task Curriculum Learning
Experiment
Experiment Setting
Main Results
Analysis of the Training Process
...and 10 more sections

Figures (7)

Figure 1: In this work, we present a novel dexterous grasping policy learning pipeline, UniDexGrasp++. Same to UniDexGraspxu2022universal, UniDexGrasp++ is trained on 3000+ different object instances with random object poses under a table-top setting. It significantly outperforms the previous SOTA and achieves 85.4% and 78.2% success rates on the train and test set.
Figure 2: Method Overview. We propose to first adopt a state-based policy learning stage followed by a vision-based policy learning stage. The state-based policy takes input robot state $R_t$, object state $S_t$, and the geometric feature $z$ of the scene point cloud of the first frame. We leverage a geometry-aware task curriculum (GeoCurriculum) to learn the first state-based generalist policy. After that, this generalist policy is further improved via iteratively performing specialist fine-tuning and distilling back to the generalist in our proposed geometry-aware iterative generalist-specialist learning (GiGSL), where the task assignment to which specialist is decided by our geometry-aware clustering (GeoClustering). For vision-based policy learning, we first distill the final state-based specialists to an initial vision-based generalist and then do GiGSL for the vision generalist, until we obtain the final vision-based generalist with the highest performance.
Figure 3: Comparison between Category-label-based Clustering and our Geometry-aware Clustering. Our state-based clustering is based on the features of the first-frame point clouds from the pre-trained encoder, while the vision-based policy utilizes its vision backbone to extract features for clustering. Due to the vision-based clustering being task-aware, we also show the grasping poses of the dexterous hands in the third row.
Figure 4: Success Rate during our GiGSL Training. We plot the success rate of each training step, where green represents the state-based policy, blue represents the vision-based policy, hollow points represent the specialist policy, and solid points represent the generalist policy.
Figure 5: Camera positions
...and 2 more figures

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

TL;DR

Abstract

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)