Table of Contents
Fetching ...

Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

Tong Qin, Changze Li, Haoyang Ye, Shaowei Wan, Minzhen Li, Hongwei Liu, Ming Yang

TL;DR

A crowd-sourced framework, which utilizes substantial data captured by production vehicles to reconstruct the scene with the NeRF model, and proposes an application, named first-view navigation, which leveraged the NeRF model to generate 3D street view and guide the driver with a synthesized video.

Abstract

Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an opening problem. Noting that the automotive industry has a huge amount of image data, crowd-sourcing is a convenient way for large-scale data collection. In this paper, we present a crowd-sourced framework, which utilizes substantial data captured by production vehicles to reconstruct the scene with the NeRF model. This approach solves the key problem of large-scale reconstruction, that is where the data comes from and how to use them. Firstly, the crowd-sourced massive data is filtered to remove redundancy and keep a balanced distribution in terms of time and space. Then a structure-from-motion module is performed to refine camera poses. Finally, images, as well as poses, are used to train the NeRF model in a certain block. We highlight that we present a comprehensive framework that integrates multiple modules, including data selection, sparse 3D reconstruction, sequence appearance embedding, depth supervision of ground surface, and occlusion completion. The complete system is capable of effectively processing and reconstructing high-quality 3D scenes from crowd-sourced data. Extensive quantitative and qualitative experiments were conducted to validate the performance of our system. Moreover, we proposed an application, named first-view navigation, which leveraged the NeRF model to generate 3D street view and guide the driver with a synthesized video.

Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

TL;DR

A crowd-sourced framework, which utilizes substantial data captured by production vehicles to reconstruct the scene with the NeRF model, and proposes an application, named first-view navigation, which leveraged the NeRF model to generate 3D street view and guide the driver with a synthesized video.

Abstract

Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an opening problem. Noting that the automotive industry has a huge amount of image data, crowd-sourcing is a convenient way for large-scale data collection. In this paper, we present a crowd-sourced framework, which utilizes substantial data captured by production vehicles to reconstruct the scene with the NeRF model. This approach solves the key problem of large-scale reconstruction, that is where the data comes from and how to use them. Firstly, the crowd-sourced massive data is filtered to remove redundancy and keep a balanced distribution in terms of time and space. Then a structure-from-motion module is performed to refine camera poses. Finally, images, as well as poses, are used to train the NeRF model in a certain block. We highlight that we present a comprehensive framework that integrates multiple modules, including data selection, sparse 3D reconstruction, sequence appearance embedding, depth supervision of ground surface, and occlusion completion. The complete system is capable of effectively processing and reconstructing high-quality 3D scenes from crowd-sourced data. Extensive quantitative and qualitative experiments were conducted to validate the performance of our system. Moreover, we proposed an application, named first-view navigation, which leveraged the NeRF model to generate 3D street view and guide the driver with a synthesized video.

Paper Structure

This paper contains 34 sections, 7 equations, 15 figures, 4 tables, 1 algorithm.

Figures (15)

  • Figure 1: (a) shows the basic idea of crowd-sourced NeRF, which is collecting data from production vehicles to train the NeRF model for large-scale reconstruction. (b) shows an application, first-view navigation. A reference line (in yellow) is rendered with the realistic scene, which provides the driver with clearer experience.The video can be found at: https://youtu.be/oVUC634R1zw.
  • Figure 2: The structure of the proposed crowd-sourcing system. The strategy of crowd-sourced data collection is elaborated in Sec. \ref{['sec:data_collection']}, which collects massive data and filters them with a balanced spatial and temporal distribution. Then, the data pre-process model, Sec. \ref{['sec:data_propocess']}, segments images semantically, extracts the depth of the ground surface, and refines the camera pose by SfM. The NeRF training procedure is illustrated in Sec. \ref{['sec:trainning']}, which trains the NeRF model with three improvements, which are sequence appearance embedding, surface depth supervision, and occlusion completion.
  • Figure 3: In (a), the image is segmented into multiple semantic groups, such as lane, crosswalk, vehicle, tree, road, stop lines, etc. (b) is the diagram of the inverse projection process. The pixel is inversely projected to the ground ($z_v = 0$), so that the depth $d$ of the ray can be obtained.
  • Figure 4: The illustration of depth supervision. The density distribution of sample points along a ray is supervised by the Dirac function.
  • Figure 5: (a) shows the vehicle we used for crowd-sourced data collection. (b) showns the sensor setup we used for experiments. (The vehicle contains more sensors than we used.)
  • ...and 10 more figures