Residual-NeRF: Learning Residual NeRFs for Transparent Object Manipulation

Bardienus P. Duisterhof; Yuemin Mao; Si Heng Teng; Jeffrey Ichnowski

Residual-NeRF: Learning Residual NeRFs for Transparent Object Manipulation

Bardienus P. Duisterhof, Yuemin Mao, Si Heng Teng, Jeffrey Ichnowski

TL;DR

Residual-NeRF addresses depth perception for transparent objects by leveraging a static background NeRF as a scene prior and learning a residual NeRF plus a Mixnet to blend them along each ray using a mixing weight $\beta \in [0,1]$. By training the background NeRF on the empty scene first, the residual network reduces ambiguity and accelerates convergence. Across nine synthetic Blender scenes and three real scenes, Residual-NeRF achieves a 46.1% lower RMSE and a 29.5% lower MAE compared with baselines, while also delivering faster training and more robust grasp planning with Dex-Net. The approach demonstrates practical impact for manipulation in mostly-static workspaces by producing fewer depth holes and cleaner depth maps that improve grasp reliability.

Abstract

Transparent objects are ubiquitous in industry, pharmaceuticals, and households. Grasping and manipulating these objects is a significant challenge for robots. Existing methods have difficulty reconstructing complete depth maps for challenging transparent objects, leaving holes in the depth reconstruction. Recent work has shown neural radiance fields (NeRFs) work well for depth perception in scenes with transparent objects, and these depth maps can be used to grasp transparent objects with high accuracy. NeRF-based depth reconstruction can still struggle with especially challenging transparent objects and lighting conditions. In this work, we propose Residual-NeRF, a method to improve depth perception and training speed for transparent objects. Robots often operate in the same area, such as a kitchen. By first learning a background NeRF of the scene without transparent objects to be manipulated, we reduce the ambiguity faced by learning the changes with the new object. We propose training two additional networks: a residual NeRF learns to infer residual RGB values and densities, and a Mixnet learns how to combine background and residual NeRFs. We contribute synthetic and real experiments that suggest Residual-NeRF improves depth perception of transparent objects. The results on synthetic data suggest Residual-NeRF outperforms the baselines with a 46.1% lower RMSE and a 29.5% lower MAE. Real-world qualitative experiments suggest Residual-NeRF leads to more robust depth maps with less noise and fewer holes. Website: https://residual-nerf.github.io

Residual-NeRF: Learning Residual NeRFs for Transparent Object Manipulation

TL;DR

. By training the background NeRF on the empty scene first, the residual network reduces ambiguity and accelerates convergence. Across nine synthetic Blender scenes and three real scenes, Residual-NeRF achieves a 46.1% lower RMSE and a 29.5% lower MAE compared with baselines, while also delivering faster training and more robust grasp planning with Dex-Net. The approach demonstrates practical impact for manipulation in mostly-static workspaces by producing fewer depth holes and cleaner depth maps that improve grasp reliability.

Abstract

Paper Structure (22 sections, 12 equations, 8 figures, 2 tables)

This paper contains 22 sections, 12 equations, 8 figures, 2 tables.

Introduction
Related Work
Problem Statement
Method
Preliminary: Training NeRF
Depth from NeRF
Learning a Residual NeRF
Experiments
Hyperparameters
Baselines
Synthetic Blender Data
Real World Data
Implementation Details
Blender Depth Results
Quantitative Comparison
...and 7 more sections

Figures (8)

Figure 1: Residual-NeRF, a method that leverages mostly static scenes to improve depth perception and speed up training. Residual-NeRF begins by learning a background NeRF of the entire scene without transparent objects. Following this, we learn a residual NeRF and a Mixnet to complement the background NeRF.
Figure 2: Residual-NeRF, a method that leverages mostly static scenes to improve depth perception and speed up training. We first learn a background NeRF of the scene without transparent objects and leverage it as a scene prior. Following this, we learn a residual NeRF and Mixnet. The Mixnet is an MLP that learns to combine the background NeRF and the residual NeRF. Equation \ref{['eq:residual_nerf']} describes how the output of the Mixnet MLP is used to combine the two NeRFs.
Figure 3: The scenes used for evaluation. We create nine synthetic Blender blender scenes with transparent objects and three real-world scenes, divided into scenes A-C with increasing difficulty.
Figure 4: Depth maps for Residual-NeRF and Dex-NeRF evaluated on the synthetic Blender dataset. The results suggest Residual-NeRF improves depth maps with fewer holes and less noise.
Figure 5: Depth maps inferred by Dex-NeRF IchnowskiAvigal2021DexNeRF and Residual-NeRF in the real world.. The result suggest Residual-NeRF results in fewer holes and less noise.
...and 3 more figures

Residual-NeRF: Learning Residual NeRFs for Transparent Object Manipulation

TL;DR

Abstract

Residual-NeRF: Learning Residual NeRFs for Transparent Object Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)