SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection

Yifu Tao; Yash Bhalgat; Lanke Frank Tarimo Fu; Matias Mattamala; Nived Chebrolu; Maurice Fallon

SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection

Yifu Tao, Yash Bhalgat, Lanke Frank Tarimo Fu, Matias Mattamala, Nived Chebrolu, Maurice Fallon

TL;DR

SiLVR addresses the challenge of large-scale robotic 3D reconstruction by fusing lidar geometry with NeRF-based texture. It extends neural radiance fields with lidar-derived depth and surface-normal constraints, and uses a lidar-SLAM trajectory to bootstrap metric scale and accelerate Structure-from-Motion via COLMAP. Submapping partitions large scenes into local NeRFs trained with hash-encoded representations, enabling 600 m-scale reconstructions while mitigating boundary artifacts. Evaluations across handheld, legged, and aerial platforms show improved geometric fidelity and photorealistic novel-view synthesis compared with vision-only NeRFs, approaching lidar-only accuracy with higher surface completeness.

Abstract

We present a neural-field-based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photo-realistic textures. This system adapts the state-of-the-art neural radiance field (NeRF) representation to also incorporate lidar data which adds strong geometric constraints on the depth and surface normals. We exploit the trajectory from a real-time lidar SLAM system to bootstrap a Structure-from-Motion (SfM) procedure to both significantly reduce the computation time and to provide metric scale which is crucial for lidar depth loss. We use submapping to scale the system to large-scale environments captured over long trajectories. We demonstrate the reconstruction system with data from a multi-camera, lidar sensor suite onboard a legged robot, hand-held while scanning building scenes for 600 metres, and onboard an aerial robot surveying a multi-storey mock disaster site-building. Website: https://ori-drs.github.io/projects/silvr/

SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection

TL;DR

Abstract

Paper Structure (17 sections, 3 equations, 6 figures, 2 tables)

This paper contains 17 sections, 3 equations, 6 figures, 2 tables.

INTRODUCTION
RELATED WORKS
Large-scale 3D Reconstruction
Neural Field Representation
METHOD
NeRF-based Scene Representation
Geometric Constraints from Lidar Measurements
Bootstrapping Camera Poses from SLAM with scale
Scaling NeRF with Submapping
EXPERIMENTAL RESULTS
Hardware and Datasets
Evaluation Metrics
Evaluation of the 3D Reconstruction
Effect of Lidar Surface Normal Loss
Effect of Bootstrapping SLAM Poses
...and 2 more sections

Figures (6)

Figure 1: Large-scale reconstruction consisting of 8 submaps of Maths Institute and H B Allen Centre in Oxford. The bottom row shows the novel views synthesised from the model and surface normals at three different locations. The trajectory of each submap is visualised in a different colour.
Figure 2: System Overview: Frontier, our custom perception unit, has three fisheye colour cameras with an IMU and a 3D lidar. Our online state estimator's trajectory is refined with COLMAP and partitioned into submaps. The camera image, lidar depth, and normal image are used to train a NeRF to get the final 3D reconstruction.
Figure 3: Comparison of reconstruction quality of Lidar-SLAM, Nerfacto (vision-only) and our approach. Reconstructions are coloured with point-to-point distance to the ground truth with increasing error from blue (0m) to red (1m). The trajectory is shown in purple and overlaid on the ground truth scan captured using a Leica BLK360. The zoomed-in views show the difference in reconstruction quality. Overall, our approach is more complete w.r.t lidar-only reconstruction, and geometrically more consistent w.r.t vision-only reconstruction.
Figure 4: Comparison of reconstruction of HBAC building using the front camera only vs. using all the three cameras. The three-camera setup generates more complete and accurate reconstructions compared to using only a single front-facing camera. The multi-camera setting is important in robotic applications where it would be infeasible to actively scan the entire scene to obtain strong viewpoint constraints.
Figure 5: Comparison of surface normal renderings of the Maths Institute. Incorporating normal constraints in addition to depth from lidar improves the smoothness of the reconstruction. Right: The smooth reconstruction of the ground portion highlights this improvement.
...and 1 more figures

SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection

TL;DR

Abstract

SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)