Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes
Martin Sundermeyer, Arsalan Mousavian, Rudolph Triebel, Dieter Fox
TL;DR
This work tackles the challenge of 6-DoF grasp generation for unknown objects in cluttered scenes by introducing Contact-GraspNet, an end-to-end network that predicts grasps directly from depth data. It employs a novel contact-point grasp representation that anchors the grasp pose to observed surface points, reducing the learnable space from $SE(3)$ to 4-DoF and enabling efficient, diverse, collision-aware grasp generation. Trained on $17.7$ million simulated grasps from the ACRONYM dataset using a PointNet++-based architecture, the method delivers fast inference (0.28s per scene) and achieves up to $90\%$ first-attempt success in real-robot experiments, outperforming prior state-of-the-art methods. The approach is robust to imperfect segmentation, supports local ROI processing, and facilitates reactive closed-loop grasping in cluttered environments, representing a significant step toward reliable autonomous manipulation in unstructured settings.
Abstract
Grasping unseen objects in unconstrained, cluttered environments is an essential skill for autonomous robotic manipulation. Despite recent progress in full 6-DoF grasp learning, existing approaches often consist of complex sequential pipelines that possess several potential failure points and run-times unsuitable for closed-loop grasping. Therefore, we propose an end-to-end network that efficiently generates a distribution of 6-DoF parallel-jaw grasps directly from a depth recording of a scene. Our novel grasp representation treats 3D points of the recorded point cloud as potential grasp contacts. By rooting the full 6-DoF grasp pose and width in the observed point cloud, we can reduce the dimensionality of our grasp representation to 4-DoF which greatly facilitates the learning process. Our class-agnostic approach is trained on 17 million simulated grasps and generalizes well to real world sensor data. In a robotic grasping study of unseen objects in structured clutter we achieve over 90% success rate, cutting the failure rate in half compared to a recent state-of-the-art method.
