Table of Contents
Fetching ...

S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation

Yurui Chen, Junge Zhang, Ziyang Xie, Wenye Li, Feihu Zhang, Jiachen Lu, Li Zhang

TL;DR

S-NeRF++ introduces a neural reconstruction–based autonomous driving simulator capable of large-scale background reconstruction and dynamic foreground generation. It combines dense depth supervision, learnable depth confidence, and a BEV-aware foreground placement with automated object insertion and physically based rendering to achieve realistic, diverse street scenes. Foreground assets are expanded via diffusion-based generation and NeuS-based reconstruction, enabling an extensive foreground bank and automated data production that improves downstream perception tasks on nuScenes and Waymo. Ablation studies demonstrate the value of depth-confidence, pose refinement, object-insertion refinement, and generation-versus-reconstruction strategies, with results outperforming prior approaches like StreetSurf and EmerNeRF in both reconstruction quality and downstream task performance.

Abstract

Autonomous driving simulation system plays a crucial role in enhancing self-driving data and simulating complex and rare traffic scenarios, ensuring navigation safety. However, traditional simulation systems, which often heavily rely on manual modeling and 2D image editing, struggled with scaling to extensive scenes and generating realistic simulation data. In this study, we present S-NeRF++, an innovative autonomous driving simulation system based on neural reconstruction. Trained on widely-used self-driving datasets such as nuScenes and Waymo, S-NeRF++ can generate a large number of realistic street scenes and foreground objects with high rendering quality as well as offering considerable flexibility in manipulation and simulation. Specifically, S-NeRF++ is an enhanced neural radiance field for synthesizing large-scale scenes and moving vehicles, with improved scene parameterization and camera pose learning. The system effectively utilizes noisy and sparse LiDAR data to refine training and address depth outliers, ensuring high-quality reconstruction and novel-view rendering. It also provides a diverse foreground asset bank by reconstructing and generating different foreground vehicles to support comprehensive scenario creation.Moreover, we have developed an advanced foreground-background fusion pipeline that skillfully integrates illumination and shadow effects, further enhancing the realism of our simulations. With the high-quality simulated data provided by our S-NeRF++, we found the perception methods enjoy performance boosts on several autonomous driving downstream tasks, further demonstrating our proposed simulator's effectiveness.

S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation

TL;DR

S-NeRF++ introduces a neural reconstruction–based autonomous driving simulator capable of large-scale background reconstruction and dynamic foreground generation. It combines dense depth supervision, learnable depth confidence, and a BEV-aware foreground placement with automated object insertion and physically based rendering to achieve realistic, diverse street scenes. Foreground assets are expanded via diffusion-based generation and NeuS-based reconstruction, enabling an extensive foreground bank and automated data production that improves downstream perception tasks on nuScenes and Waymo. Ablation studies demonstrate the value of depth-confidence, pose refinement, object-insertion refinement, and generation-versus-reconstruction strategies, with results outperforming prior approaches like StreetSurf and EmerNeRF in both reconstruction quality and downstream task performance.

Abstract

Autonomous driving simulation system plays a crucial role in enhancing self-driving data and simulating complex and rare traffic scenarios, ensuring navigation safety. However, traditional simulation systems, which often heavily rely on manual modeling and 2D image editing, struggled with scaling to extensive scenes and generating realistic simulation data. In this study, we present S-NeRF++, an innovative autonomous driving simulation system based on neural reconstruction. Trained on widely-used self-driving datasets such as nuScenes and Waymo, S-NeRF++ can generate a large number of realistic street scenes and foreground objects with high rendering quality as well as offering considerable flexibility in manipulation and simulation. Specifically, S-NeRF++ is an enhanced neural radiance field for synthesizing large-scale scenes and moving vehicles, with improved scene parameterization and camera pose learning. The system effectively utilizes noisy and sparse LiDAR data to refine training and address depth outliers, ensuring high-quality reconstruction and novel-view rendering. It also provides a diverse foreground asset bank by reconstructing and generating different foreground vehicles to support comprehensive scenario creation.Moreover, we have developed an advanced foreground-background fusion pipeline that skillfully integrates illumination and shadow effects, further enhancing the realism of our simulations. With the high-quality simulated data provided by our S-NeRF++, we found the perception methods enjoy performance boosts on several autonomous driving downstream tasks, further demonstrating our proposed simulator's effectiveness.
Paper Structure (47 sections, 20 equations, 17 figures, 11 tables)

This paper contains 47 sections, 20 equations, 17 figures, 11 tables.

Figures (17)

  • Figure 1: The whole pipeline of our simulation system. We construct our background representation and foreground bank. Given any pose of the camera and object, we render for these two branches separately. Then we judge the occlusion relationship between the foreground and background by depth, process the edge and illumination of the inserted object, render the shadow, and finally obtain the simulation data with labels.
  • Figure 2: Our depth supervision with confidence maps
  • Figure 3: Illustration of the confidence computation process.
  • Figure 4: Visualization of each confidence component. Brighter regions indicate higher confidence. Geometry confidences (flow and depth) represents the geometry consistency. The perception confidence measure the photometric, local structure and feature consistency.
  • Figure 5: Cascade sampling strategy to reconstruct the background and moving objects together. The object network whose weights are shared through all sampling stages to decode volume rendering attributes when the point is in the corresponding object bounding box while background/proposal network to decode the point not in any box. The rendering weights in the proposal stage will guide the sampling points in the next stage and will be only supervised by the rendering weights in the next stage by $\mathcal{L}_{prop}$. Notice that the gradient will not pass to the object network in the proposal stage.
  • ...and 12 more figures