Table of Contents
Fetching ...

Are NeRFs ready for autonomous driving? Towards closing the real-to-simulation gap

Carl Lindström, Georg Hess, Adam Lilja, Maryam Fatemi, Lars Hammarstrand, Christoffer Petersson, Lennart Svensson

TL;DR

The paper tackles the real-to-simulation gap in NeRF-based autonomous driving testing by shifting focus from rendering fidelity to perception robustness against NeRF artifacts. It introduces image augmentations, mixed-in NeRF fine-tuning, and NeRF-like image translation to train perception models that perform consistently on real and NeRF-rendered data, validated on nuScenes with 3D object detection and online mapping tasks. Large-scale experiments reveal substantial reductions in the real2sim gap for several models, with strong correlations between transfer success and perceptual metrics such as LPIPS and FID. While extrapolated viewpoints still pose challenges, the study provides practical guidance for deploying NeRFs in AD testing and highlights perceptual quality as a key driver of cross-domain transfer.

Abstract

Neural Radiance Fields (NeRFs) have emerged as promising tools for advancing autonomous driving (AD) research, offering scalable closed-loop simulation and data augmentation capabilities. However, to trust the results achieved in simulation, one needs to ensure that AD systems perceive real and rendered data in the same way. Although the performance of rendering methods is increasing, many scenarios will remain inherently challenging to reconstruct faithfully. To this end, we propose a novel perspective for addressing the real-to-simulated data gap. Rather than solely focusing on improving rendering fidelity, we explore simple yet effective methods to enhance perception model robustness to NeRF artifacts without compromising performance on real data. Moreover, we conduct the first large-scale investigation into the real-to-simulated data gap in an AD setting using a state-of-the-art neural rendering technique. Specifically, we evaluate object detectors and an online mapping model on real and simulated data, and study the effects of different fine-tuning strategies.Our results show notable improvements in model robustness to simulated data, even improving real-world performance in some cases. Last, we delve into the correlation between the real-to-simulated gap and image reconstruction metrics, identifying FID and LPIPS as strong indicators. See https://research.zenseact.com/publications/closing-real2sim-gap for our project page.

Are NeRFs ready for autonomous driving? Towards closing the real-to-simulation gap

TL;DR

The paper tackles the real-to-simulation gap in NeRF-based autonomous driving testing by shifting focus from rendering fidelity to perception robustness against NeRF artifacts. It introduces image augmentations, mixed-in NeRF fine-tuning, and NeRF-like image translation to train perception models that perform consistently on real and NeRF-rendered data, validated on nuScenes with 3D object detection and online mapping tasks. Large-scale experiments reveal substantial reductions in the real2sim gap for several models, with strong correlations between transfer success and perceptual metrics such as LPIPS and FID. While extrapolated viewpoints still pose challenges, the study provides practical guidance for deploying NeRFs in AD testing and highlights perceptual quality as a key driver of cross-domain transfer.

Abstract

Neural Radiance Fields (NeRFs) have emerged as promising tools for advancing autonomous driving (AD) research, offering scalable closed-loop simulation and data augmentation capabilities. However, to trust the results achieved in simulation, one needs to ensure that AD systems perceive real and rendered data in the same way. Although the performance of rendering methods is increasing, many scenarios will remain inherently challenging to reconstruct faithfully. To this end, we propose a novel perspective for addressing the real-to-simulated data gap. Rather than solely focusing on improving rendering fidelity, we explore simple yet effective methods to enhance perception model robustness to NeRF artifacts without compromising performance on real data. Moreover, we conduct the first large-scale investigation into the real-to-simulated data gap in an AD setting using a state-of-the-art neural rendering technique. Specifically, we evaluate object detectors and an online mapping model on real and simulated data, and study the effects of different fine-tuning strategies.Our results show notable improvements in model robustness to simulated data, even improving real-world performance in some cases. Last, we delve into the correlation between the real-to-simulated gap and image reconstruction metrics, identifying FID and LPIPS as strong indicators. See https://research.zenseact.com/publications/closing-real2sim-gap for our project page.
Paper Structure (25 sections, 9 figures, 8 tables)

This paper contains 25 sections, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Using NeRFs for autonomous driving testing requires perception models to treat rendered and real images similarly. A BEVFormer model trained on real data detects objects (blue) in high-quality renderings (top). However, when quality decreases (bottom), e.g., scenes challenging for the NeRF, the same model fails to detect even close-by cars. Instead of emphasizing rendering fidelity, we propose to make models robust to these distortions. Fine-tuning the same model on NeRF-like images (red) reduces the real-to-sim gap without harming real-world performance.
  • Figure 2: Overview of our data pipeline for fine-tuning (top) and evaluating (bottom) perception models. We explore three different augmentation methods for fine-tuning.
  • Figure 3: Examples of our different data augmentation strategies to make perception models more robust.
  • Figure 4: Detection agreement vs. fraction of the evaluation range, evaluated for the 3DOD models with different fine-tuning methods.
  • Figure 5: Online mapping predictions and ground truth for images shifted to the left (top), real images (middle), and rendered images without shift (bottom). When input data is shifted 2m to the left, the left road boundary, highlighted in red, should be straddled by the ego vehicle. However, the predictions maintain the ego vehicle within the boundary despite the image shift.
  • ...and 4 more figures