Table of Contents
Fetching ...

Rethinking 6-Dof Grasp Detection: A Flexible Framework for High-Quality Grasping

Pengwei Xie, Siang Chen, Wei Tang, Dingchang Hu, Wenming Yang, Guijin Wang

TL;DR

This work rethink 6-Dof grasp detection from a grasp-centric view and proposes a versatile grasp framework capable of handling both scene-level and target-oriented grasping, achieving over 18% and 23% improvement on unseen splits of the GraspNet-1Billion Dataset.

Abstract

Robotic grasping is a primitive skill for complex tasks and is fundamental to intelligence. For general 6-Dof grasping, most previous methods directly extract scene-level semantic or geometric information, while few of them consider the suitability for various downstream applications, such as target-oriented grasping. Addressing this issue, we rethink 6-Dof grasp detection from a grasp-centric view and propose a versatile grasp framework capable of handling both scene-level and target-oriented grasping. Our framework, FlexLoG, is composed of a Flexible Guidance Module and a Local Grasp Model. Specifically, the Flexible Guidance Module is compatible with both global (e.g., grasp heatmap) and local (e.g., visual grounding) guidance, enabling the generation of high-quality grasps across various tasks. The Local Grasp Model focuses on object-agnostic regional points and predicts grasps locally and intently. Experiment results reveal that our framework achieves over 18% and 23% improvement on unseen splits of the GraspNet-1Billion Dataset. Furthermore, real-world robotic tests in three distinct settings yield a 95% success rate.

Rethinking 6-Dof Grasp Detection: A Flexible Framework for High-Quality Grasping

TL;DR

This work rethink 6-Dof grasp detection from a grasp-centric view and proposes a versatile grasp framework capable of handling both scene-level and target-oriented grasping, achieving over 18% and 23% improvement on unseen splits of the GraspNet-1Billion Dataset.

Abstract

Robotic grasping is a primitive skill for complex tasks and is fundamental to intelligence. For general 6-Dof grasping, most previous methods directly extract scene-level semantic or geometric information, while few of them consider the suitability for various downstream applications, such as target-oriented grasping. Addressing this issue, we rethink 6-Dof grasp detection from a grasp-centric view and propose a versatile grasp framework capable of handling both scene-level and target-oriented grasping. Our framework, FlexLoG, is composed of a Flexible Guidance Module and a Local Grasp Model. Specifically, the Flexible Guidance Module is compatible with both global (e.g., grasp heatmap) and local (e.g., visual grounding) guidance, enabling the generation of high-quality grasps across various tasks. The Local Grasp Model focuses on object-agnostic regional points and predicts grasps locally and intently. Experiment results reveal that our framework achieves over 18% and 23% improvement on unseen splits of the GraspNet-1Billion Dataset. Furthermore, real-world robotic tests in three distinct settings yield a 95% success rate.
Paper Structure (18 sections, 4 equations, 8 figures, 4 tables)

This paper contains 18 sections, 4 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Our framework can be flexibly integrated with global or local guidance methods for different application scenarios. Global guidance, such as grasp heatmap, can be utilized to generate scene-level grasps. Local guidance, such as object detection, can be utilized to generate target-oriented grasps.
  • Figure 2: A: The region is cropped from the scene point cloud in the camera frame. B: The local neighbor points are transformed to the local region frame. C: The regional grasp representation as $(\theta, \gamma, \beta, w, \Delta x, \Delta y, \Delta z)$.
  • Figure 3: The architecture of FlexLoG. Taking a monocular observation image as input, the Flexible Guidance Module (FGM) utilizes different guidance methods (e.g., grasp heatmap for global guidance and object detection for local guidance) to identify potential graspable areas and sample points as regional centers. These points are then clustered into multiple local regions. The Local Grasp (LoG) Model then extracts geometric features and predicts grasps. Depending on the guidance method used in the FGM, the output is either scene-level or target-oriented grasps.
  • Figure 4: Local grasp scores can be spliced to form a grasp heatmap, illustrating the graspability. As the number of sampled centers $K$ increases, so does the heatmap's resolution, leading to a more accurate depiction of the graspable areas.
  • Figure 5: The proposed light-weighted PointMLP-based encoder structure of Local Grasp Model.
  • ...and 3 more figures