Table of Contents
Fetching ...

PC-NeRF: Parent-Child Neural Radiance Fields Using Sparse LiDAR Frames in Autonomous Driving Environments

Xiuzhong Hu, Guangming Xiong, Zheng Zang, Peng Jia, Yuxuan Han, Junyi Ma

TL;DR

This work tackles large-scale 3D scene reconstruction and novel LiDAR view synthesis under temporally sparse frames in autonomous driving. It introduces PC-NeRF, a hierarchical framework with parent NeRFs and child NeRFs that share a network and employ a multi-level scene representation to efficiently leverage sparse LiDAR data. A two-step depth inference process locates relevant child NeRFs via $AABB$ tests and then refines depth within the selected region using losses defined at the scene, segment, and point levels, including $\mathcal{L}_{ij}^{\mathrm{pd}}$, $\mathcal{L}_{ij}^{\mathrm{cf}}$, and $\mathcal{L}_{ij}^{\mathrm{cd}}$ with weights $\lambda_{\mathrm{pd}}$, $\lambda_{\mathrm{cf}}$, $\lambda_{\mathrm{cd}}$ and parameters $\varepsilon$, $\gamma$. Experiments on MaiCity and KITTI demonstrate that PC-NeRF achieves high-precision novel LiDAR view synthesis and 3D reconstruction with as little as one epoch of training, and ablations validate the effectiveness of the hierarchical partitioning and two-step depth inference for sparse data scenarios.

Abstract

Large-scale 3D scene reconstruction and novel view synthesis are vital for autonomous vehicles, especially utilizing temporally sparse LiDAR frames. However, conventional explicit representations remain a significant bottleneck towards representing the reconstructed and synthetic scenes at unlimited resolution. Although the recently developed neural radiance fields (NeRF) have shown compelling results in implicit representations, the problem of large-scale 3D scene reconstruction and novel view synthesis using sparse LiDAR frames remains unexplored. To bridge this gap, we propose a 3D scene reconstruction and novel view synthesis framework called parent-child neural radiance field (PC-NeRF). Based on its two modules, parent NeRF and child NeRF, the framework implements hierarchical spatial partitioning and multi-level scene representation, including scene, segment, and point levels. The multi-level scene representation enhances the efficient utilization of sparse LiDAR point cloud data and enables the rapid acquisition of an approximate volumetric scene representation. With extensive experiments, PC-NeRF is proven to achieve high-precision novel LiDAR view synthesis and 3D reconstruction in large-scale scenes. Moreover, PC-NeRF can effectively handle situations with sparse LiDAR frames and demonstrate high deployment efficiency with limited training epochs. Our approach implementation and the pre-trained models are available at https://github.com/biter0088/pc-nerf.

PC-NeRF: Parent-Child Neural Radiance Fields Using Sparse LiDAR Frames in Autonomous Driving Environments

TL;DR

This work tackles large-scale 3D scene reconstruction and novel LiDAR view synthesis under temporally sparse frames in autonomous driving. It introduces PC-NeRF, a hierarchical framework with parent NeRFs and child NeRFs that share a network and employ a multi-level scene representation to efficiently leverage sparse LiDAR data. A two-step depth inference process locates relevant child NeRFs via tests and then refines depth within the selected region using losses defined at the scene, segment, and point levels, including , , and with weights , , and parameters , . Experiments on MaiCity and KITTI demonstrate that PC-NeRF achieves high-precision novel LiDAR view synthesis and 3D reconstruction with as little as one epoch of training, and ablations validate the effectiveness of the hierarchical partitioning and two-step depth inference for sparse data scenarios.

Abstract

Large-scale 3D scene reconstruction and novel view synthesis are vital for autonomous vehicles, especially utilizing temporally sparse LiDAR frames. However, conventional explicit representations remain a significant bottleneck towards representing the reconstructed and synthetic scenes at unlimited resolution. Although the recently developed neural radiance fields (NeRF) have shown compelling results in implicit representations, the problem of large-scale 3D scene reconstruction and novel view synthesis using sparse LiDAR frames remains unexplored. To bridge this gap, we propose a 3D scene reconstruction and novel view synthesis framework called parent-child neural radiance field (PC-NeRF). Based on its two modules, parent NeRF and child NeRF, the framework implements hierarchical spatial partitioning and multi-level scene representation, including scene, segment, and point levels. The multi-level scene representation enhances the efficient utilization of sparse LiDAR point cloud data and enables the rapid acquisition of an approximate volumetric scene representation. With extensive experiments, PC-NeRF is proven to achieve high-precision novel LiDAR view synthesis and 3D reconstruction in large-scale scenes. Moreover, PC-NeRF can effectively handle situations with sparse LiDAR frames and demonstrate high deployment efficiency with limited training epochs. Our approach implementation and the pre-trained models are available at https://github.com/biter0088/pc-nerf.
Paper Structure (15 sections, 10 equations, 7 figures, 6 tables)

This paper contains 15 sections, 10 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: PC-NeRF excels in 3D scene reconstruction and novel view synthesis, showcasing robustness to increased LiDAR frame sparsity with minimal training. Each subfigure depicts 3D scene reconstruction achieved by stitching real or synthetic LiDAR views with their corresponding poses. This scene involves frames 1151-1200 from the KITTI 00 sequence, encompassing diverse elements like the ground, grass, walls, and vehicles. White dots in each subfigure depict the LiDAR positions of each frame, and CD gauges 3D reconstruction accuracy, with smaller values indicating superior performance. As the proportion of LiDAR frames for training decreases, signifying increased sparsity, PC-NeRF achieves sparse-to-dense 3D reconstruction, as evident in the last three rows of subfigures. Moreover, utilizing only 33 % of LiDAR frames during training demonstrates advantages in both reconstruction quality and time consumption compared to using 50 % and 80 % of frames, as depicted in the first three rows of subfigures. More details are in Sec. \ref{['sec:Lesser point clouds for 3D reconstruction']}.
  • Figure 2: Our PC-NeRF framework: (a) The hierarchical spatial partition divides the entire large-scale scene into large blocks, referred to as parent NeRFs. After multi-frame point cloud fusing, ground filtering, and non-ground point cloud clustering, a large block is further divided into point cloud geometric segments represented by a child NeRF. The parent NeRF shares a network with the child NeRFs within it. (b) In the multi-level scene representation, the surface intersections of the LiDAR ray with the parent and child NeRF AABBs and the LiDAR origin are used to divide the entire LiDAR ray into different line segments. The three losses on these line segments concurrently optimize the scene representation at the scene level, segment level, and point level, effectively leveraging sparse LiDAR frames. (c) For depth inference of each LiDAR ray, PC-NeRF searches in the parent NeRF AABB to locate corresponding child NeRF AABBs and then refines its inference in the child NeRF AABBs for higher precision.
  • Figure 3: Three LiDAR losses and child NeRF segmented sampling. The three LiDAR losses include parent NeRF depth loss, child NeRF depth loss, and child NeRF free loss. Using different sampling densities, the Child NeRF segmented sampling uniformly samples both inside and outside the intersection of the LiDAR ray with the Child NeRF.
  • Figure 4: Parent-child NeRF's two-step depth inference effect illustration. The five subfigures in Fig. \ref{['fig_infer']}(b) represent depth value inference results for the five LiDAR rays in Fig. \ref{['fig_infer']}(a), where the weight distribution data comes from our proposed PC-NeRF model trained on the KITTI 00 sequence 1151-1200 frame scene in Sec. \ref{['sec:Evaluating']}.
  • Figure 5: 3D scene reconstruction on the MaiCity and KITTI datasets. Each subfigure represents the result of concatenating multiple LiDAR frames using real poses. The white dots in each subfigure represent the LiDAR positions of each frame. In Fig. (b): "one/two-step" denotes the one/two-step depth inference method. Fig. (b) illustrates the inference results corresponding to Tab. \ref{['tab:Parent-child NeRF Inference Effect']}. The "PC-NeRF--two step" column in Fig. (b) corresponds to the rows with 20 % frame sparsity in Tab. \ref{['tab:Lesser']}.
  • ...and 2 more figures