Table of Contents
Fetching ...

Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes

Siang Chen, Wei Tang, Pengwei Xie, Wenming Yang, Guijin Wang

TL;DR

This work tackles efficient 6-Dof grasp detection in clutter by introducing a heatmap-guided, global-to-local semantic-to-point framework. It jointly learns Grasp Heatmap Modeling and a Non-uniform Multi-Grasp Generator, leveraging Gaussian-encoded heatmaps and a grid-based attribute prediction plus a novel rotation anchor-shifting mechanism to produce dense, high-quality grasps in real time. The method achieves state-of-the-art performance on TS-ACRONYM and GraspNet-1Billion benchmarks and validates robustness through real-robot experiments with a 94% success rate and 100% clutter completion. The approach enables fast, scalable grasp generation by focusing computation on regions of interest and fusing semantic cues with local geometry, with potential extensions to closed-loop, multi-view grasping systems.

Abstract

Fast and robust object grasping in clutter is a crucial component of robotics. Most current works resort to the whole observed point cloud for 6-Dof grasp generation, ignoring the guidance information excavated from global semantics, thus limiting high-quality grasp generation and real-time performance. In this work, we show that the widely used heatmaps are underestimated in the efficiency of 6-Dof grasp generation. Therefore, we propose an effective local grasp generator combined with grasp heatmaps as guidance, which infers in a global-to-local semantic-to-point way. Specifically, Gaussian encoding and the grid-based strategy are applied to predict grasp heatmaps as guidance to aggregate local points into graspable regions and provide global semantic information. Further, a novel non-uniform anchor sampling mechanism is designed to improve grasp accuracy and diversity. Benefiting from the high-efficiency encoding in the image space and focusing on points in local graspable regions, our framework can perform high-quality grasp detection in real-time and achieve state-of-the-art results. In addition, real robot experiments demonstrate the effectiveness of our method with a success rate of 94% and a clutter completion rate of 100%. Our code is available at https://github.com/THU-VCLab/HGGD.

Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes

TL;DR

This work tackles efficient 6-Dof grasp detection in clutter by introducing a heatmap-guided, global-to-local semantic-to-point framework. It jointly learns Grasp Heatmap Modeling and a Non-uniform Multi-Grasp Generator, leveraging Gaussian-encoded heatmaps and a grid-based attribute prediction plus a novel rotation anchor-shifting mechanism to produce dense, high-quality grasps in real time. The method achieves state-of-the-art performance on TS-ACRONYM and GraspNet-1Billion benchmarks and validates robustness through real-robot experiments with a 94% success rate and 100% clutter completion. The approach enables fast, scalable grasp generation by focusing computation on regions of interest and fusing semantic cues with local geometry, with potential extensions to closed-loop, multi-view grasping systems.

Abstract

Fast and robust object grasping in clutter is a crucial component of robotics. Most current works resort to the whole observed point cloud for 6-Dof grasp generation, ignoring the guidance information excavated from global semantics, thus limiting high-quality grasp generation and real-time performance. In this work, we show that the widely used heatmaps are underestimated in the efficiency of 6-Dof grasp generation. Therefore, we propose an effective local grasp generator combined with grasp heatmaps as guidance, which infers in a global-to-local semantic-to-point way. Specifically, Gaussian encoding and the grid-based strategy are applied to predict grasp heatmaps as guidance to aggregate local points into graspable regions and provide global semantic information. Further, a novel non-uniform anchor sampling mechanism is designed to improve grasp accuracy and diversity. Benefiting from the high-efficiency encoding in the image space and focusing on points in local graspable regions, our framework can perform high-quality grasp detection in real-time and achieve state-of-the-art results. In addition, real robot experiments demonstrate the effectiveness of our method with a success rate of 94% and a clutter completion rate of 100%. Our code is available at https://github.com/THU-VCLab/HGGD.
Paper Structure (23 sections, 4 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 4 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: The key insight of our method is generating the grasp heatmaps as guidance for regional geometric feature mining and further grasp pose generation via a novel local grasp generator.
  • Figure 2: Proposed grasp representation as $(u,v,\theta,w,d,\gamma,\beta)$.
  • Figure 3: The architecture of HGGD. Taking a monocular RGBD image as input, GHM generates grasp confidence heatmap $Q_c$ and grided attributes heatmaps $(Q_{\theta},Q_w,Q_d)$. Then NMG transfers the depth image to the point cloud through camera intrinsics $\mathbf{c}$ for region aggregation under the guidance of heatmaps. Feature fusion and the point encoder extract regional features fused with semantic information from GHM. Finally, a multi-grasp generator combined with a novel non-uniform anchor sampling mechanism utilizes the fusion features to output the grasps.
  • Figure 4: Visualization of how the ground truth 6-Dof grasps are projected. Grasp confidence heatmap ${\hat{Q}_{c}}$ and attribute heatmaps $(\hat{Q}_{\theta},\hat{Q}_{w},\hat{Q}_{d})$ are encoded with Gaussian kernel and grids, respectively.
  • Figure 5: The pipeline of local region feature extraction with semantic-to-point feature fusion.
  • ...and 3 more figures