Table of Contents
Fetching ...

Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects

Jeffrey Ichnowski, Yahav Avigal, Justin Kerr, Ken Goldberg

TL;DR

Dex-NeRF demonstrates that Neural Radiance Fields can recover the geometry of transparent objects well enough to support robust robotic grasping. By rendering a transparency-aware depth map from NeRF and feeding it into Dex-Net, and by strategically placing lights to induce informative specular reflections, the method substantially improves grasp success on transparent objects. The work contributes (i) a NeRF-based pipeline integrated with robot grasp planning, (ii) a depth-rendering approach tailored for transparency, and (iii) synthetic and real datasets capturing transparent scenes; physical experiments yield 90–100% grasp success on ABB YuMi, outperforming baselines. This approach broadens automated manipulation capabilities in cluttered, transparent-rich environments such as kitchens and warehouses, and highlights practical considerations for camera arrays in robot workcells.

Abstract

The ability to grasp and manipulate transparent objects is a major challenge for robots. Existing depth cameras have difficulty detecting, localizing, and inferring the geometry of such objects. We propose using neural radiance fields (NeRF) to detect, localize, and infer the geometry of transparent objects with sufficient accuracy to find and grasp them securely. We leverage NeRF's view-independent learned density, place lights to increase specular reflections, and perform a transparency-aware depth-rendering that we feed into the Dex-Net grasp planner. We show how additional lights create specular reflections that improve the quality of the depth map, and test a setup for a robot workcell equipped with an array of cameras to perform transparent object manipulation. We also create synthetic and real datasets of transparent objects in real-world settings, including singulated objects, cluttered tables, and the top rack of a dishwasher. In each setting we show that NeRF and Dex-Net are able to reliably compute robust grasps on transparent objects, achieving 90% and 100% grasp success rates in physical experiments on an ABB YuMi, on objects where baseline methods fail.

Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects

TL;DR

Dex-NeRF demonstrates that Neural Radiance Fields can recover the geometry of transparent objects well enough to support robust robotic grasping. By rendering a transparency-aware depth map from NeRF and feeding it into Dex-Net, and by strategically placing lights to induce informative specular reflections, the method substantially improves grasp success on transparent objects. The work contributes (i) a NeRF-based pipeline integrated with robot grasp planning, (ii) a depth-rendering approach tailored for transparency, and (iii) synthetic and real datasets capturing transparent scenes; physical experiments yield 90–100% grasp success on ABB YuMi, outperforming baselines. This approach broadens automated manipulation capabilities in cluttered, transparent-rich environments such as kitchens and warehouses, and highlights practical considerations for camera arrays in robot workcells.

Abstract

The ability to grasp and manipulate transparent objects is a major challenge for robots. Existing depth cameras have difficulty detecting, localizing, and inferring the geometry of such objects. We propose using neural radiance fields (NeRF) to detect, localize, and infer the geometry of transparent objects with sufficient accuracy to find and grasp them securely. We leverage NeRF's view-independent learned density, place lights to increase specular reflections, and perform a transparency-aware depth-rendering that we feed into the Dex-Net grasp planner. We show how additional lights create specular reflections that improve the quality of the depth map, and test a setup for a robot workcell equipped with an array of cameras to perform transparent object manipulation. We also create synthetic and real datasets of transparent objects in real-world settings, including singulated objects, cluttered tables, and the top rack of a dishwasher. In each setting we show that NeRF and Dex-Net are able to reliably compute robust grasps on transparent objects, achieving 90% and 100% grasp success rates in physical experiments on an ABB YuMi, on objects where baseline methods fail.

Paper Structure

This paper contains 19 sections, 2 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Using NeRF to grasp transparent objects Given a scene with transparent objects (left column), we the pipeline on the right to compute grasps (middle column). The top row shows Dex-NeRF working in a simulated scene while the bottom row shows it working in a physical scene.
  • Figure 2: Comparison to RealSense Depth Camera. We compare the results of the proposed pipeline in a real-world setting against the depth map produced by an Intel RealSense camera. In the left image is the real-world scene, the middle shows the depth image from the RealSense, and the right shows the result of our pipeline. The color scheme in the RealSense image is provided by the RealSense SDK, while the color scheme in the right column is from MatPlotLib. We observe that the RealSense depth camera is unable to recover depth from a large portion of the scene, shown in black. On the other hand, the proposed pipeline, while having a few holes, can recover depth for most of the scene.
  • Figure 3: Using NeRF to render depth for grasping transparent objects. Dex-NeRF uses a transparency-aware depth rendering to render depth maps that can be used for grasp planning. In contrast, Vanilla-NeRF's depth maps are filled with holes and result in poor grasp predictions.
  • Figure 4: Synthetic singulated objects used in simulation experiments. Top row: image of the object in the training data. Bottom row: computed depth map and candidate grasp.
  • Figure 5: Grasp success rate vs training epochs. As opposed to view-synthesis, which requires over 200k epochs, we observe high grasp success rates after 50k to 60k epochs.
  • ...and 4 more figures