An Economic Framework for 6-DoF Grasp Detection
Xiao-Ming Wu, Jia-Feng Cai, Jian-Jian Jiang, Dian Zheng, Yi-Lin Wei, Wei-Shi Zheng
TL;DR
The paper tackles the high resource cost and slow convergence of density-supervision-based 6-DoF grasp detection by introducing EconomicGrasp, an economic supervision framework. It maintains all grasp views to mitigate label ambiguity while employing a focal representation that includes an Interactive Grasp Head and a Composite Score Estimation to learn a specific grasp efficiently under sparse supervision. A three-pronged strategy—grasp pose pruning, scene-level label aggregation, and selective loss—reduces labels from 55 GB to 1.6 GB and cuts training time, memory, and storage costs dramatically, while preserving or improving performance by about 3 AP on average on GraspNet-1Billion. Real-world tests and extensive ablations corroborate the method’s robustness and efficiency, suggesting significant practical impact for resource-constrained robotic grasping. The work provides code and demonstrates a viable path toward scalable, high-performance grasping with reduced supervision requirements.
Abstract
Robotic grasping in clutters is a fundamental task in robotic manipulation. In this work, we propose an economic framework for 6-DoF grasp detection, aiming to economize the resource cost in training and meanwhile maintain effective grasp performance. To begin with, we discover that the dense supervision is the bottleneck of current SOTA methods that severely encumbers the entire training overload, meanwhile making the training difficult to converge. To solve the above problem, we first propose an economic supervision paradigm for efficient and effective grasping. This paradigm includes a well-designed supervision selection strategy, selecting key labels basically without ambiguity, and an economic pipeline to enable the training after selection. Furthermore, benefit from the economic supervision, we can focus on a specific grasp, and thus we devise a focal representation module, which comprises an interactive grasp head and a composite score estimation to generate the specific grasp more accurately. Combining all together, the EconomicGrasp framework is proposed. Our extensive experiments show that EconomicGrasp surpasses the SOTA grasp method by about 3AP on average, and with extremely low resource cost, for about 1/4 training time cost, 1/8 memory cost and 1/30 storage cost. Our code is available at https://github.com/iSEE-Laboratory/EconomicGrasp.
