NeuralLabeling: A versatile toolset for labeling vision datasets using Neural Radiance Fields
Floris Erich, Naoya Chiba, Yusuke Yoshiyasu, Noriaki Ando, Ryo Hanai, Yukiyasu Domae
TL;DR
NeuralLabeling presents a NeRF-based labeling toolkit that enables efficient, 3D-consistent annotation of vision datasets from image sequences. By supporting bounding-box and mesh-based pipelines and leveraging NeRF occlusions, it yields rich outputs such as depth maps, 6DOF poses, and segmentation masks, including for challenging transparent objects. The authors create Dishwasher30k, a large supervised depth-completion dataset, and show that models trained on NeuralLabeling-generated data outperform weakly supervised baselines, with a practical robot demonstration achieving 83.3% grasp success for transparent glasses. Overall, the work demonstrates how NeRF-based labeling can accelerate large-scale, geometry-aware dataset creation with tangible benefits for perception and robotic manipulation in complex scenes, while noting the labeling time as a current bottleneck for broader deployment.
Abstract
We present NeuralLabeling, a labeling approach and toolset for annotating 3D scenes using either bounding boxes or meshes and generating segmentation masks, affordance maps, 2D bounding boxes, 3D bounding boxes, 6DOF object poses, depth maps, and object meshes. NeuralLabeling uses Neural Radiance Fields (NeRF) as a renderer, allowing labeling to be performed using 3D spatial tools while incorporating geometric clues such as occlusions, relying only on images captured from multiple viewpoints as input. To demonstrate the applicability of NeuralLabeling to a practical problem in robotics, we added ground truth depth maps to 30000 frames of transparent object RGB and noisy depth maps of glasses placed in a dishwasher captured using an RGBD sensor, yielding the Dishwasher30k dataset. We show that training a simple deep neural network with supervision using the annotated depth maps yields a higher reconstruction performance than training with the previously applied weakly supervised approach. We also show how instance segmentation and depth completion datasets generated using NeuralLabeling can be incorporated into a robot application for grasping transparent objects placed in a dishwasher with an accuracy of 83.3%, compared to 16.3% without depth completion.
