Table of Contents
Fetching ...

AnyDexGrasp: General Dexterous Grasping for Different Hands with Human-level Learning Efficiency

Hao-Shu Fang, Hengxu Yan, Zhenyu Tang, Hongjie Fang, Chenxi Wang, Cewu Lu

TL;DR

AnyDexGrasp introduces a two-stage, hand-agnostic framework for visually guided dexterous grasping that separates perception into a universal Contact-centric Grasp Representation (CGR) and a hand-specific grasp decision module. The CGR provides a transferable state space that, when combined with object-agnostic training and real-world trial data, enables high grasp success across three different hands with hundreds rather than millions of grasps, even in cluttered and adversarial object sets. The approach achieves 75-95% success with 40 training objects (400-1000 attempts) and 80-98% with 144 objects, while analyses reveal geometry coverage and dense local geometry sampling as key factors in generalization. The results highlight strong cross-hand transferability, substantial learning efficiency, and potential integration with tactile sensing for further robustness in real-world manipulation tasks.

Abstract

We introduce an efficient approach for learning dexterous grasping with minimal data, advancing robotic manipulation capabilities across different robotic hands. Unlike traditional methods that require millions of grasp labels for each robotic hand, our method achieves high performance with human-level learning efficiency: only hundreds of grasp attempts on 40 training objects. The approach separates the grasping process into two stages: first, a universal model maps scene geometry to intermediate contact-centric grasp representations, independent of specific robotic hands. Next, a unique grasp decision model is trained for each robotic hand through real-world trial and error, translating these representations into final grasp poses. Our results show a grasp success rate of 75-95\% across three different robotic hands in real-world cluttered environments with over 150 novel objects, improving to 80-98\% with increased training objects. This adaptable method demonstrates promising applications for humanoid robots, prosthetics, and other domains requiring robust, versatile robotic manipulation.

AnyDexGrasp: General Dexterous Grasping for Different Hands with Human-level Learning Efficiency

TL;DR

AnyDexGrasp introduces a two-stage, hand-agnostic framework for visually guided dexterous grasping that separates perception into a universal Contact-centric Grasp Representation (CGR) and a hand-specific grasp decision module. The CGR provides a transferable state space that, when combined with object-agnostic training and real-world trial data, enables high grasp success across three different hands with hundreds rather than millions of grasps, even in cluttered and adversarial object sets. The approach achieves 75-95% success with 40 training objects (400-1000 attempts) and 80-98% with 144 objects, while analyses reveal geometry coverage and dense local geometry sampling as key factors in generalization. The results highlight strong cross-hand transferability, substantial learning efficiency, and potential integration with tactile sensing for further robustness in real-world manipulation tasks.

Abstract

We introduce an efficient approach for learning dexterous grasping with minimal data, advancing robotic manipulation capabilities across different robotic hands. Unlike traditional methods that require millions of grasp labels for each robotic hand, our method achieves high performance with human-level learning efficiency: only hundreds of grasp attempts on 40 training objects. The approach separates the grasping process into two stages: first, a universal model maps scene geometry to intermediate contact-centric grasp representations, independent of specific robotic hands. Next, a unique grasp decision model is trained for each robotic hand through real-world trial and error, translating these representations into final grasp poses. Our results show a grasp success rate of 75-95\% across three different robotic hands in real-world cluttered environments with over 150 novel objects, improving to 80-98\% with increased training objects. This adaptable method demonstrates promising applications for humanoid robots, prosthetics, and other domains requiring robust, versatile robotic manipulation.

Paper Structure

This paper contains 20 sections, 20 equations, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: The overview of our method.(A): Our method consists of two steps. The first step is to train a representation model on partial-view point cloud. The training set only consists of 40 objects. The second step would fix the representation model, and train a grasp decision model that takes the grasp-centric contact representation as input and outputs the grasp success score, based on hundreds of real-world trial-and-error attempts. The grasp algorithm is tested thoroughly on hundreds of unseen objects. (B): Illustration of contact-centric grasp representation. A local geometry is discrete into several tangent planes along the approach direction of a robotic hand. Each tangent surface is transformed into the polarized coordinate frame of the robotic hand. The shape of the surface is encoded into discretized points and normal representation in the polar coordinate. (C): Our experiments are also carried out on a three-finger hand and a five-finger hand and demonstrate excellent performance.
  • Figure 2: Illustration of the predefined grasp types for three robotic hands. The types are categorized by the number of fingers involved in the grasping procedure. Some types can be categorized into multiple taxonomies defined in previous literature feix2015grasp when the grasping depths differ.
  • Figure 3: Experimental setup.(A): Platform setting of our dexterous grasping experiments. (B): Illustration of our 40 training objects and 150 testing objects. The testing objects are much more diverse than the training objects, including deformable and adversarial objects not presented in the training set.
  • Figure 4: Success rates on the testing set after training on abundant real-world data.(A): The averaged and detailed success rates of the DH-3 hand on five object categories commonly encountered in our daily activities. (B): The averaged and detailed success rates of the Allegro hand. (C): The averaged and detailed success rates of the Inspire hand. (D): The success rates on the adversarial objects of three robotic hands.
  • Figure 5: Success rates on the testing set after training on reduced real-world data.(A): We reduce the training object number from 144 to 40 and 30 respectively and test the success rates on different categories of objects. (B): With 40 training objects, we reduce the data from around 1000 trials per grasp type to 100 trials and 50 trials respectively. (C) and (D): When reducing the training data on fewer training objects and fewer grasp attempts, success rates on both Allegro hand and Inspire hand only decrease slightly, showing good generalization ability and high learning efficiency of our method.
  • ...and 5 more figures