Table of Contents
Fetching ...

PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp

Yaofeng Cheng, Fusheng Zha, Wei Guo, Pengfei Wang, Chao Zeng, Lining Sun, Chenguang Yang

TL;DR

The paper tackles the challenge of incomplete geometry in single-view depth-point clouds for 6-DoF grasping. It introduces Completion Feature Grasp (CFG), which converts point completion into a hidden-space shape feature (via a PCF-Layer) that augments the grasp network while retaining reliance on original points. A Score Filter then selects executable grasps by considering robot motion feasibility, bridging the gap between network output and real-world execution. Real-world experiments show notable gains over state-of-the-art methods and demonstrate robustness across viewpoints and clutter, with 89% success in real robot grasping and around an 18% improvement over prior best results.

Abstract

The 6-Degree of Freedom (DoF) grasp method based on point clouds has shown significant potential in enabling robots to grasp target objects. However, most existing methods are based on the point clouds (2.5D points) generated from single-view depth images. These point clouds only have one surface side of the object providing incomplete geometry information, which mislead the grasping algorithm to judge the shape of the target object, resulting in low grasping accuracy. Humans can accurately grasp objects from a single view by leveraging their geometry experience to estimate object shapes. Inspired by humans, we propose a novel 6-DoF grasping framework that converts the point completion results as object shape features to train the 6-DoF grasp network. Here, point completion can generate approximate complete points from the 2.5D points similar to the human geometry experience, and converting it as shape features is the way to utilize it to improve grasp efficiency. Furthermore, due to the gap between the network generation and actual execution, we integrate a score filter into our framework to select more executable grasp proposals for the real robot. This enables our method to maintain a high grasp quality in any camera viewpoint. Extensive experiments demonstrate that utilizing complete point features enables the generation of significantly more accurate grasp proposals and the inclusion of a score filter greatly enhances the credibility of real-world robot grasping. Our method achieves a 17.8\% success rate higher than the state-of-the-art method in real-world experiments.

PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp

TL;DR

The paper tackles the challenge of incomplete geometry in single-view depth-point clouds for 6-DoF grasping. It introduces Completion Feature Grasp (CFG), which converts point completion into a hidden-space shape feature (via a PCF-Layer) that augments the grasp network while retaining reliance on original points. A Score Filter then selects executable grasps by considering robot motion feasibility, bridging the gap between network output and real-world execution. Real-world experiments show notable gains over state-of-the-art methods and demonstrate robustness across viewpoints and clutter, with 89% success in real robot grasping and around an 18% improvement over prior best results.

Abstract

The 6-Degree of Freedom (DoF) grasp method based on point clouds has shown significant potential in enabling robots to grasp target objects. However, most existing methods are based on the point clouds (2.5D points) generated from single-view depth images. These point clouds only have one surface side of the object providing incomplete geometry information, which mislead the grasping algorithm to judge the shape of the target object, resulting in low grasping accuracy. Humans can accurately grasp objects from a single view by leveraging their geometry experience to estimate object shapes. Inspired by humans, we propose a novel 6-DoF grasping framework that converts the point completion results as object shape features to train the 6-DoF grasp network. Here, point completion can generate approximate complete points from the 2.5D points similar to the human geometry experience, and converting it as shape features is the way to utilize it to improve grasp efficiency. Furthermore, due to the gap between the network generation and actual execution, we integrate a score filter into our framework to select more executable grasp proposals for the real robot. This enables our method to maintain a high grasp quality in any camera viewpoint. Extensive experiments demonstrate that utilizing complete point features enables the generation of significantly more accurate grasp proposals and the inclusion of a score filter greatly enhances the credibility of real-world robot grasping. Our method achieves a 17.8\% success rate higher than the state-of-the-art method in real-world experiments.

Paper Structure

This paper contains 13 sections, 8 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Difference between previous methods and our method. The computer is not trained in the point completion dataset leads to approximate completion results and its screen is reflective, where the depth camera loses its points. (A) The end-to-end method generates the wrong proposals at the bottom of the screen. (B) The existing completion method generates wrong proposals at the points where there is no corresponding object. (C) Our method remains robust by leveraging the point completion results as shape information, leading the network to generate precise grasps on the original points.
  • Figure 2: Pipeline of our whole proposed framework. (a) The Completion Feature Grasp Module is the grasp network prediction stage that predicts reasonable grasp proposals from the grasp network. Concatenating the completion points and the original points together and feeding them into the PCF-Layer to convert them to shape features for grasp training provide more object shape information for grasp prediction. (b) The score filter adjusts the grasp scores predicted by the network based on the positional relationship between the robot arm and the grasp pose, ensuring the selection of grasp poses that are more suitable for the robot arm. (c) The visualized grasps represent the top 20 scores selected from 1024 grasp candidates for the robot.
  • Figure 3: The grasp representation. $c$ depicts the grasp contact point. Vector $\mathbf{a}$ is the gripper approach, and $\mathbf{b}$ is the gripper grasp base. $w$ is the predicted grasp width, and $d$ is the distance from the base frame to the grasp baseline. The five orange points on the green gripper are the representation of the five gripper points $\mathbf v \in \mathbb R^{5 \times 3}$.
  • Figure 4: PCF-Layer mechanism. (a) In contrast to the original points, which only represent surface information, the completion points contain the object's approximate spatial shape information. Thus, the PCF-Layer provides the spatial shape features necessary for grasping. Notably, the purple shape outline for shape information and features is hand-crafted for easier understanding. (b) The PCF-Layer maps the concatenated points to the original points. (c) The PCF-Layer uses points feature learning blocks (drawing adapted from qi2017pointnet++qi2017pointnet) to extract the object shape information for grasp training.
  • Figure 5: The grasp network. After the PCF-Layer outputs the $\mathcal{F}$, both it and the original points are fed into the point encoder to learn the complete points feature. Then four heads MLP decoders are used to generate grasp elements $\mathbf {z}_1$ and $\mathbf {z}_2$, grasp widths $\hat{w}_i$ and grasp scores $\hat{s}$.
  • ...and 8 more figures