Table of Contents
Fetching ...

NeRF2Points: Large-Scale Point Cloud Generation From Street Views' Radiance Field Optimization

Peng Tu, Xun Zhou, Mingming Wang, Xiaojun Yang, Bo Peng, Ping Chen, Xiu Su, Yawen Huang, Yefeng Zheng, Chang Xu

TL;DR

NeRF2Points presents a tailored NeRF variant for large-scale urban point cloud generation from street-view RGB data. It leverages Layered Perception and Integrated Modeling (LPiM) to separately model road surfaces and street views, and employs Geometric-Aware Consistency Regularization (GAC) with spatial dynamic consistency and temporal invariant consistency losses to mitigate pavement collapse and other artifacts. The approach is supported by a 20-kilometer street-view dataset with high-resolution imagery, depth maps, normals, and LiDAR ground truth, and demonstrates quantitative gains over several NeRF baselines in PSNR, SSIM, and Chamfer Distance, along with ablation analyses that highlight the contributions of LPiM and the proposed losses. By enabling RGBD point cloud generation from RGB sequences, NeRF2Points offers a cost-effective pathway to dense urban perception data with potential for 4D extensions in future work.

Abstract

Neural Radiance Fields (NeRF) have emerged as a paradigm-shifting methodology for the photorealistic rendering of objects and environments, enabling the synthesis of novel viewpoints with remarkable fidelity. This is accomplished through the strategic utilization of object-centric camera poses characterized by significant inter-frame overlap. This paper explores a compelling, alternative utility of NeRF: the derivation of point clouds from aggregated urban landscape imagery. The transmutation of street-view data into point clouds is fraught with complexities, attributable to a nexus of interdependent variables. First, high-quality point cloud generation hinges on precise camera poses, yet many datasets suffer from inaccuracies in pose metadata. Also, the standard approach of NeRF is ill-suited for the distinct characteristics of street-view data from autonomous vehicles in vast, open settings. Autonomous vehicle cameras often record with limited overlap, leading to blurring, artifacts, and compromised pavement representation in NeRF-based point clouds. In this paper, we present NeRF2Points, a tailored NeRF variant for urban point cloud synthesis, notable for its high-quality output from RGB inputs alone. Our paper is supported by a bespoke, high-resolution 20-kilometer urban street dataset, designed for point cloud generation and evaluation. NeRF2Points adeptly navigates the inherent challenges of NeRF-based point cloud synthesis through the implementation of the following strategic innovations: (1) Integration of Weighted Iterative Geometric Optimization (WIGO) and Structure from Motion (SfM) for enhanced camera pose accuracy, elevating street-view data precision. (2) Layered Perception and Integrated Modeling (LPiM) is designed for distinct radiance field modeling in urban environments, resulting in coherent point cloud representations.

NeRF2Points: Large-Scale Point Cloud Generation From Street Views' Radiance Field Optimization

TL;DR

NeRF2Points presents a tailored NeRF variant for large-scale urban point cloud generation from street-view RGB data. It leverages Layered Perception and Integrated Modeling (LPiM) to separately model road surfaces and street views, and employs Geometric-Aware Consistency Regularization (GAC) with spatial dynamic consistency and temporal invariant consistency losses to mitigate pavement collapse and other artifacts. The approach is supported by a 20-kilometer street-view dataset with high-resolution imagery, depth maps, normals, and LiDAR ground truth, and demonstrates quantitative gains over several NeRF baselines in PSNR, SSIM, and Chamfer Distance, along with ablation analyses that highlight the contributions of LPiM and the proposed losses. By enabling RGBD point cloud generation from RGB sequences, NeRF2Points offers a cost-effective pathway to dense urban perception data with potential for 4D extensions in future work.

Abstract

Neural Radiance Fields (NeRF) have emerged as a paradigm-shifting methodology for the photorealistic rendering of objects and environments, enabling the synthesis of novel viewpoints with remarkable fidelity. This is accomplished through the strategic utilization of object-centric camera poses characterized by significant inter-frame overlap. This paper explores a compelling, alternative utility of NeRF: the derivation of point clouds from aggregated urban landscape imagery. The transmutation of street-view data into point clouds is fraught with complexities, attributable to a nexus of interdependent variables. First, high-quality point cloud generation hinges on precise camera poses, yet many datasets suffer from inaccuracies in pose metadata. Also, the standard approach of NeRF is ill-suited for the distinct characteristics of street-view data from autonomous vehicles in vast, open settings. Autonomous vehicle cameras often record with limited overlap, leading to blurring, artifacts, and compromised pavement representation in NeRF-based point clouds. In this paper, we present NeRF2Points, a tailored NeRF variant for urban point cloud synthesis, notable for its high-quality output from RGB inputs alone. Our paper is supported by a bespoke, high-resolution 20-kilometer urban street dataset, designed for point cloud generation and evaluation. NeRF2Points adeptly navigates the inherent challenges of NeRF-based point cloud synthesis through the implementation of the following strategic innovations: (1) Integration of Weighted Iterative Geometric Optimization (WIGO) and Structure from Motion (SfM) for enhanced camera pose accuracy, elevating street-view data precision. (2) Layered Perception and Integrated Modeling (LPiM) is designed for distinct radiance field modeling in urban environments, resulting in coherent point cloud representations.
Paper Structure (22 sections, 13 equations, 8 figures, 2 tables)

This paper contains 22 sections, 13 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: (a) NeRF's camera settings: the original NeRF is usually designed for object-centric scenes and requires hundreds of highly accurate camera poses to be heavily overlapped; (b) Self-driving car's camera settings: Street view data are usually collected by using limited perspective cameras and run along the road, with almost no overlapping between different road sections and without object-centric camera views. Moreover, some objects or contents only appear in limited images (2 $\sim$ 6) since the self-driving car moves fast. However, most of the objects need to be reconstructed from hundreds of surrounding views with large overlaps, as shown in (a).
  • Figure 2: Four major defects while using NeRF to generate point clouds: (a) and (c) is the floating artifacts $\&$ blurriness, and geometric inconsistency: These issues arise from the street-view data collection pipeline used in NeRF. The pipeline obtains sparse views, making it challenging to establish effective geometric constraints for modeling spatio-temporal radiance fields (as discussed in Sec. \ref{['sec:GAC']}). (b) is the layering: Layering refers to the significant stratification observed between continuous but distinct road section point clouds generated by NeRF. (d) is the pavement collapse: This phenomenon occurs due to the small and highly similar gradient values of weak texture pixels. As a result, recovering fine correspondent regions in the radiation field becomes difficult. Specifically, bundles of points representing road surfaces suffer from poor optimization, leading to inaccurate depth estimation in these regions.
  • Figure 3: Here are some examples of the 20 kilometers of street view data we’ve collected.
  • Figure 4: Overview of our proposed method: generation point clouds from NeRF (NeRF2Points). RGB sequences and their corresponding depth maps, along with normal vectors, are fed into the ray gate function $\textit{M}(.)$. This process separates the information related to road and street scenes for each ray. After individually modeling the point clouds for road and street scenes, we merge them to create a complete street-view point cloud. The resulting merged point cloud is highlighted by the yellow dotted box in the center of the right image.
  • Figure 5: We focus on optimizing hard examples to mitigate artifacts in generated point clouds. In the above picture, the red and blue grid corresponds to the $s ~\times~ s$ field $\mathcal{R}^{'}$, centered around the red hard sample. Within this field, we randomly choose sample points and red hard sample points to create a consistency loss, which helps control the error optimization of the hard sample points.
  • ...and 3 more figures