Table of Contents
Fetching ...

A Surprisingly Efficient Representation for Multi-Finger Grasping

Hengxu Yan, Hao-Shu Fang, Cewu Lu

TL;DR

This work tackles the data inefficiency and high degrees of freedom in multi-finger grasping by introducing a circular antipodal representation that densely covers a scene and can be mapped to a discretized 16-type grasp taxonomy. A lightweight decision model is trained on top of a representation model (GSNet-based) to score multi-finger grasp candidates, enabling accurate grasp poses with only hundreds to thousands of real-world attempts. The approach demonstrates strong real-robot performance, achieving 78.64% success with 500 attempts and 87% with 4500, plus 84.51% in dynamic handover tasks, validating both static robustness and temporal stability. Overall, the method reduces data requirements while maintaining high grasp success across objects, clutter, and dynamic scenarios, making multi-finger grasping more practical for real-world deployment.

Abstract

The problem of grasping objects using a multi-finger hand has received significant attention in recent years. However, it remains challenging to handle a large number of unfamiliar objects in real and cluttered environments. In this work, we propose a representation that can be effectively mapped to the multi-finger grasp space. Based on this representation, we develop a simple decision model that generates accurate grasp quality scores for different multi-finger grasp poses using only hundreds to thousands of training samples. We demonstrate that our representation performs well on a real robot and achieves a success rate of 78.64% after training with only 500 real-world grasp attempts and 87% with 4500 grasp attempts. Additionally, we achieve a success rate of 84.51% in a dynamic human-robot handover scenario using a multi-finger hand.

A Surprisingly Efficient Representation for Multi-Finger Grasping

TL;DR

This work tackles the data inefficiency and high degrees of freedom in multi-finger grasping by introducing a circular antipodal representation that densely covers a scene and can be mapped to a discretized 16-type grasp taxonomy. A lightweight decision model is trained on top of a representation model (GSNet-based) to score multi-finger grasp candidates, enabling accurate grasp poses with only hundreds to thousands of real-world attempts. The approach demonstrates strong real-robot performance, achieving 78.64% success with 500 attempts and 87% with 4500, plus 84.51% in dynamic handover tasks, validating both static robustness and temporal stability. Overall, the method reduces data requirements while maintaining high grasp success across objects, clutter, and dynamic scenarios, making multi-finger grasping more practical for real-world deployment.

Abstract

The problem of grasping objects using a multi-finger hand has received significant attention in recent years. However, it remains challenging to handle a large number of unfamiliar objects in real and cluttered environments. In this work, we propose a representation that can be effectively mapped to the multi-finger grasp space. Based on this representation, we develop a simple decision model that generates accurate grasp quality scores for different multi-finger grasp poses using only hundreds to thousands of training samples. We demonstrate that our representation performs well on a real robot and achieves a success rate of 78.64% after training with only 500 real-world grasp attempts and 87% with 4500 grasp attempts. Additionally, we achieve a success rate of 84.51% in a dynamic human-robot handover scenario using a multi-finger hand.
Paper Structure (20 sections, 7 equations, 9 figures, 2 tables)

This paper contains 20 sections, 7 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The circular antipodal representations for multi-finger grasping. The less transparent the representation is, the higher the circular antipodal score.
  • Figure 2: (a) 2D circular antipodal representations. The depth of color denotes the level of antipodal scores. (b) 3D circular antipodal representations. (c) The coordinate system of parallel antipodal and multi-finger grasping
  • Figure 3: Multi-finger grasp type, every row denotes that we use the same depth but a different finger type to grasp, and every column denotes the same type but a different grasp depth.
  • Figure 4: The architecture of decision model
  • Figure 5: Performance of different training data on the evaluation dataset
  • ...and 4 more figures