A Surprisingly Efficient Representation for Multi-Finger Grasping
Hengxu Yan, Hao-Shu Fang, Cewu Lu
TL;DR
This work tackles the data inefficiency and high degrees of freedom in multi-finger grasping by introducing a circular antipodal representation that densely covers a scene and can be mapped to a discretized 16-type grasp taxonomy. A lightweight decision model is trained on top of a representation model (GSNet-based) to score multi-finger grasp candidates, enabling accurate grasp poses with only hundreds to thousands of real-world attempts. The approach demonstrates strong real-robot performance, achieving 78.64% success with 500 attempts and 87% with 4500, plus 84.51% in dynamic handover tasks, validating both static robustness and temporal stability. Overall, the method reduces data requirements while maintaining high grasp success across objects, clutter, and dynamic scenarios, making multi-finger grasping more practical for real-world deployment.
Abstract
The problem of grasping objects using a multi-finger hand has received significant attention in recent years. However, it remains challenging to handle a large number of unfamiliar objects in real and cluttered environments. In this work, we propose a representation that can be effectively mapped to the multi-finger grasp space. Based on this representation, we develop a simple decision model that generates accurate grasp quality scores for different multi-finger grasp poses using only hundreds to thousands of training samples. We demonstrate that our representation performs well on a real robot and achieves a success rate of 78.64% after training with only 500 real-world grasp attempts and 87% with 4500 grasp attempts. Additionally, we achieve a success rate of 84.51% in a dynamic human-robot handover scenario using a multi-finger hand.
