Table of Contents
Fetching ...

LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes

Shanlin Sun, Bingbing Zhuang, Ziyu Jiang, Buyu Liu, Xiaohui Xie, Manmohan Chandraker

TL;DR

LidaRF enhances NeRF-based street-scene rendering by fusing LiDAR-derived geometry with a high-resolution hash-grid radiance field, enabling robust depth supervision and view augmentation. It introduces a dense LiDAR encoding backbone via a 3D sparse UNet, a curriculum-based occlusion-aware depth loss, and synthetic view augmentation projected from LiDAR, all integrated into a Nerfacto-inspired framework. Across Pandaset, NuScenes, and Argoverse, LidaRF achieves state-of-the-art novel view synthesis, particularly under lane-shift extrapolation, by leveraging LiDAR to provide strong geometric priors and more complete depth supervision. The results demonstrate practical potential for photorealistic street scene simulation in autonomous driving, while noting the static-background limitation and suggesting dynamic-object extension as future work.

Abstract

Photorealistic simulation plays a crucial role in applications such as autonomous driving, where advances in neural radiance fields (NeRFs) may allow better scalability through the automatic creation of digital 3D assets. However, reconstruction quality suffers on street scenes due to largely collinear camera motions and sparser samplings at higher speeds. On the other hand, the application often demands rendering from camera views that deviate from the inputs to accurately simulate behaviors like lane changes. In this paper, we propose several insights that allow a better utilization of Lidar data to improve NeRF quality on street scenes. First, our framework learns a geometric scene representation from Lidar, which is fused with the implicit grid-based representation for radiance decoding, thereby supplying stronger geometric information offered by explicit point cloud. Second, we put forth a robust occlusion-aware depth supervision scheme, which allows utilizing densified Lidar points by accumulation. Third, we generate augmented training views from Lidar points for further improvement. Our insights translate to largely improved novel view synthesis under real driving scenes.

LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes

TL;DR

LidaRF enhances NeRF-based street-scene rendering by fusing LiDAR-derived geometry with a high-resolution hash-grid radiance field, enabling robust depth supervision and view augmentation. It introduces a dense LiDAR encoding backbone via a 3D sparse UNet, a curriculum-based occlusion-aware depth loss, and synthetic view augmentation projected from LiDAR, all integrated into a Nerfacto-inspired framework. Across Pandaset, NuScenes, and Argoverse, LidaRF achieves state-of-the-art novel view synthesis, particularly under lane-shift extrapolation, by leveraging LiDAR to provide strong geometric priors and more complete depth supervision. The results demonstrate practical potential for photorealistic street scene simulation in autonomous driving, while noting the static-background limitation and suggesting dynamic-object extension as future work.

Abstract

Photorealistic simulation plays a crucial role in applications such as autonomous driving, where advances in neural radiance fields (NeRFs) may allow better scalability through the automatic creation of digital 3D assets. However, reconstruction quality suffers on street scenes due to largely collinear camera motions and sparser samplings at higher speeds. On the other hand, the application often demands rendering from camera views that deviate from the inputs to accurately simulate behaviors like lane changes. In this paper, we propose several insights that allow a better utilization of Lidar data to improve NeRF quality on street scenes. First, our framework learns a geometric scene representation from Lidar, which is fused with the implicit grid-based representation for radiance decoding, thereby supplying stronger geometric information offered by explicit point cloud. Second, we put forth a robust occlusion-aware depth supervision scheme, which allows utilizing densified Lidar points by accumulation. Third, we generate augmented training views from Lidar points for further improvement. Our insights translate to largely improved novel view synthesis under real driving scenes.
Paper Structure (22 sections, 7 equations, 13 figures, 7 tables)

This paper contains 22 sections, 7 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Our framework leverages Lidar to a deep extent to unlock its potential for neural rendering on street scenes, leading to state-of-the-art performance in comparison to UniSim yang2023unisim.
  • Figure 2: Overview of LidaRF -- it takes as input the sampled 3D positions $\textbf{x}$ and ray directions $\textbf{d}$, and outputs corresponding density $\alpha$ and color $\textbf{c}$. It incorporates both hash encoding and LiDAR encoding using a sparse UNet. Additionally, augmented training data is generated through LiDAR projections, and the geometry prediction is trained with our proposed robust depth supervision scheme.
  • Figure 3: Illustration of the occlusion issue on the depth map projected from accumulated Lidar points. Observe that multiple layers of surface points may project to the same region on the image, yielding ghost depth points.
  • Figure 4: Illustration of the true probability mass and its mid-point approximation.
  • Figure 5: Qualitative Comparison on novel view synthesis from different methods. We evaluate on both the interpolation and extrapolation views, the latter of which corresponds to a lane shift. We highlight the performance gap with boxes.
  • ...and 8 more figures