An Economic Framework for 6-DoF Grasp Detection

Xiao-Ming Wu; Jia-Feng Cai; Jian-Jian Jiang; Dian Zheng; Yi-Lin Wei; Wei-Shi Zheng

An Economic Framework for 6-DoF Grasp Detection

Xiao-Ming Wu, Jia-Feng Cai, Jian-Jian Jiang, Dian Zheng, Yi-Lin Wei, Wei-Shi Zheng

TL;DR

The paper tackles the high resource cost and slow convergence of density-supervision-based 6-DoF grasp detection by introducing EconomicGrasp, an economic supervision framework. It maintains all grasp views to mitigate label ambiguity while employing a focal representation that includes an Interactive Grasp Head and a Composite Score Estimation to learn a specific grasp efficiently under sparse supervision. A three-pronged strategy—grasp pose pruning, scene-level label aggregation, and selective loss—reduces labels from 55 GB to 1.6 GB and cuts training time, memory, and storage costs dramatically, while preserving or improving performance by about 3 AP on average on GraspNet-1Billion. Real-world tests and extensive ablations corroborate the method’s robustness and efficiency, suggesting significant practical impact for resource-constrained robotic grasping. The work provides code and demonstrates a viable path toward scalable, high-performance grasping with reduced supervision requirements.

Abstract

Robotic grasping in clutters is a fundamental task in robotic manipulation. In this work, we propose an economic framework for 6-DoF grasp detection, aiming to economize the resource cost in training and meanwhile maintain effective grasp performance. To begin with, we discover that the dense supervision is the bottleneck of current SOTA methods that severely encumbers the entire training overload, meanwhile making the training difficult to converge. To solve the above problem, we first propose an economic supervision paradigm for efficient and effective grasping. This paradigm includes a well-designed supervision selection strategy, selecting key labels basically without ambiguity, and an economic pipeline to enable the training after selection. Furthermore, benefit from the economic supervision, we can focus on a specific grasp, and thus we devise a focal representation module, which comprises an interactive grasp head and a composite score estimation to generate the specific grasp more accurately. Combining all together, the EconomicGrasp framework is proposed. Our extensive experiments show that EconomicGrasp surpasses the SOTA grasp method by about 3AP on average, and with extremely low resource cost, for about 1/4 training time cost, 1/8 memory cost and 1/30 storage cost. Our code is available at https://github.com/iSEE-Laboratory/EconomicGrasp.

An Economic Framework for 6-DoF Grasp Detection

TL;DR

Abstract

Paper Structure (28 sections, 7 figures, 11 tables)

This paper contains 28 sections, 7 figures, 11 tables.

Introduction
Revisiting 6-DoF Grasp Detection
Task Definition.
Development of 6-DoF Grasp Detection.
The Economic Grasp Framework
A Vanilla Grasp Framework
Economic Supervision
Ambiguity Problem.
Economic Supervision Paradigm.
Focal Representation under Economic Supervision
Interactive Grasp Head.
Composite Score Estimation.
Framework Overall
Dataset and Details
Dataset.
...and 13 more sections

Figures (7)

Figure 1: Economic supervision vs. dense supervision. In our economic framework, the resource cost is minimal and the training is easy to converge. Moreover, with our well design for economic supervision, our framework achieves better performance than the SOTA dense supervision method GSNetwang2021graspness. "ep" means epochs. All the costs are tested in an empty machine with one NVIDIA RTX3090 GPU for fair. The results are trained and tested with GraspNet-1Billionfang2020graspnet on Kinect data.
Figure 2: Task definition and the grasp pose. The input for this task is the single-view point cloud from the depth camera and the model aims to output the successful 6-DoF grasp poses for the input scene.
Figure 3: In specific points, there exist many good grasps that with different poses. If we reduce the supervision, it may cause ambiguity in the learning process.
Figure 4: Interactive grasp head.
Figure 5: Frameworks overview. Benefit from the economic supervision and focal representation, our economic framework achieves effective performance with low costs. We highlight the main contributions of our paper in bold face.
...and 2 more figures

An Economic Framework for 6-DoF Grasp Detection

TL;DR

Abstract

An Economic Framework for 6-DoF Grasp Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)