Table of Contents
Fetching ...

Learning Any-View 6DoF Robotic Grasping in Cluttered Scenes via Neural Surface Rendering

Snehal Jauhri, Ishikaa Lunawat, Georgia Chalvatzaki

TL;DR

This work reinterprets grasping as rendering and introduces NeuGraspNet, a novel method for 6DoF grasp detection that leverages advances in neural volumetric representations and surface rendering.

Abstract

A significant challenge for real-world robotic manipulation is the effective 6DoF grasping of objects in cluttered scenes from any single viewpoint without the need for additional scene exploration. This work reinterprets grasping as rendering and introduces NeuGraspNet, a novel method for 6DoF grasp detection that leverages advances in neural volumetric representations and surface rendering. It encodes the interaction between a robot's end-effector and an object's surface by jointly learning to render the local object surface and learning grasping functions in a shared feature space. The approach uses global (scene-level) features for grasp generation and local (grasp-level) neural surface features for grasp evaluation. This enables effective, fully implicit 6DoF grasp quality prediction, even in partially observed scenes. NeuGraspNet operates on random viewpoints, common in mobile manipulation scenarios, and outperforms existing implicit and semi-implicit grasping methods. The real-world applicability of the method has been demonstrated with a mobile manipulator robot, grasping in open, cluttered spaces. Project website at https://sites.google.com/view/neugraspnet

Learning Any-View 6DoF Robotic Grasping in Cluttered Scenes via Neural Surface Rendering

TL;DR

This work reinterprets grasping as rendering and introduces NeuGraspNet, a novel method for 6DoF grasp detection that leverages advances in neural volumetric representations and surface rendering.

Abstract

A significant challenge for real-world robotic manipulation is the effective 6DoF grasping of objects in cluttered scenes from any single viewpoint without the need for additional scene exploration. This work reinterprets grasping as rendering and introduces NeuGraspNet, a novel method for 6DoF grasp detection that leverages advances in neural volumetric representations and surface rendering. It encodes the interaction between a robot's end-effector and an object's surface by jointly learning to render the local object surface and learning grasping functions in a shared feature space. The approach uses global (scene-level) features for grasp generation and local (grasp-level) neural surface features for grasp evaluation. This enables effective, fully implicit 6DoF grasp quality prediction, even in partially observed scenes. NeuGraspNet operates on random viewpoints, common in mobile manipulation scenarios, and outperforms existing implicit and semi-implicit grasping methods. The real-world applicability of the method has been demonstrated with a mobile manipulator robot, grasping in open, cluttered spaces. Project website at https://sites.google.com/view/neugraspnet
Paper Structure (29 sections, 5 equations, 9 figures, 9 tables)

This paper contains 29 sections, 5 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: NeuGraspNet: A single-view 3D Truncated Signed Distance Field (TSDF) grid is processed through a convolutional occupancy network to reconstruct the scene (cf. \ref{['subsec:scene']}). The occupancy network is used to perform global, scene-level rendering. The rendered scene is used for grasp candidate generation in SE(3) (cf. \ref{['subsec:gpg']}). We re-interpret grasping as rendering of local surface points and query their features from the shared 3D feature volume. Local points, their features, and the 6DoF grasp pose are passed to a Grasping PointNetwork to predict per grasp quality (cf. \ref{['subsec:local']}). NeuGraspNet effectively learns the interaction between the objects' geometry and the gripper to detect high-fidelity grasps.
  • Figure 2: Scene-level surface rendering: (a) an input single-view pointcloud; (b) surface rendering on the neural implicit geometry (grey volume) using 6 'virtual' cameras; (c) the reconstructed surface pointcloud; (d) sampled grasp candidates.
  • Figure 3: Local surface rendering: (a) rendering the neural implicit geometry by ray-marching 3 'virtual' cameras at the three parts of the gripper (gripper used here only for visualization); (b) the neural rendered surface; (c) noisy ground-truth rendered surface used during training for local occupancy supervision (light pink points are unoccupied and dark red points are occupied); (d) ground-truth simulated scene.
  • Figure 4: Example scene reconstructions & detected grasps for unseen test objects from the VGN Breyer2020 (top) and the EGAD morrison2020egad (bottom) datasets. We see that our network can sometimes create artifacts or is unable to reconstruct very fine details, especially for the hard EGAD objects. Nevertheless, even in these hard cases, our network is able to reconstruct the broad structure of the scene & objects which results in the detection of good grasps (d).
  • Figure 5: Example failure cases observed in (a) simulated and (b) real-world experiments.
  • ...and 4 more figures