Table of Contents
Fetching ...

XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis

Hao Li, Chenming Wu, Ming Yuan, Yan Zhang, Chen Zhao, Chunyu Song, Haocheng Feng, Errui Ding, Dingwen Zhang, Jingdong Wang

TL;DR

XLD introduces a cross-lane driving dataset and benchmark to evaluate novel view synthesis under closed-loop autonomous driving conditions, addressing the gap where existing NVS benchmarks only interpolate within training trajectories. Built in CARLA, the dataset provides six scenes with cross-trajectory testing offsets of $0\mathrm{m}$ to $4\mathrm{m}$, three RGB cameras, and LiDAR, enabling front-only and multi-camera evaluation in diverse weather. Benchmark results show NeRF-based methods generally outperform 3D Gaussian Splatting in cross-lane scenarios, with EmerNeRF and UC-NeRF delivering robust cross-lane performance while Gaussian baselines can overfit and degrade with offsets; multi-camera training improves cross-lane fidelity, and precise geometry is key for realistic cross-lane rendering. PVG’s self-decomposition improves background rendering but can still falter at larger offsets, highlighting the ongoing need for geometry-aware, cross-trajectory NVS methods. The work provides a valuable platform for advancing NVS toward realistic closed-loop autonomous driving simulations and motivates future real-world cross-lane ground-truth datasets to validate these synthetic benchmarks.

Abstract

Comprehensive testing of autonomous systems through simulation is essential to ensure the safety of autonomous driving vehicles. This requires the generation of safety-critical scenarios that extend beyond the limitations of real-world data collection, as many of these scenarios are rare or rarely encountered on public roads. However, evaluating most existing novel view synthesis (NVS) methods relies on sporadic sampling of image frames from the training data, comparing the rendered images with ground-truth images. Unfortunately, this evaluation protocol falls short of meeting the actual requirements in closed-loop simulations. Specifically, the true application demands the capability to render novel views that extend beyond the original trajectory (such as cross-lane views), which are challenging to capture in the real world. To address this, this paper presents a synthetic dataset for novel driving view synthesis evaluation, which is specifically designed for autonomous driving simulations. This unique dataset includes testing images captured by deviating from the training trajectory by $1-4$ meters. It comprises six sequences that cover various times and weather conditions. Each sequence contains $450$ training images, $120$ testing images, and their corresponding camera poses and intrinsic parameters. Leveraging this novel dataset, we establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multicamera settings. The experimental findings underscore the significant gap in current approaches, revealing their inadequate ability to fulfill the demanding prerequisites of cross-lane or closed-loop simulation.

XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis

TL;DR

XLD introduces a cross-lane driving dataset and benchmark to evaluate novel view synthesis under closed-loop autonomous driving conditions, addressing the gap where existing NVS benchmarks only interpolate within training trajectories. Built in CARLA, the dataset provides six scenes with cross-trajectory testing offsets of to , three RGB cameras, and LiDAR, enabling front-only and multi-camera evaluation in diverse weather. Benchmark results show NeRF-based methods generally outperform 3D Gaussian Splatting in cross-lane scenarios, with EmerNeRF and UC-NeRF delivering robust cross-lane performance while Gaussian baselines can overfit and degrade with offsets; multi-camera training improves cross-lane fidelity, and precise geometry is key for realistic cross-lane rendering. PVG’s self-decomposition improves background rendering but can still falter at larger offsets, highlighting the ongoing need for geometry-aware, cross-trajectory NVS methods. The work provides a valuable platform for advancing NVS toward realistic closed-loop autonomous driving simulations and motivates future real-world cross-lane ground-truth datasets to validate these synthetic benchmarks.

Abstract

Comprehensive testing of autonomous systems through simulation is essential to ensure the safety of autonomous driving vehicles. This requires the generation of safety-critical scenarios that extend beyond the limitations of real-world data collection, as many of these scenarios are rare or rarely encountered on public roads. However, evaluating most existing novel view synthesis (NVS) methods relies on sporadic sampling of image frames from the training data, comparing the rendered images with ground-truth images. Unfortunately, this evaluation protocol falls short of meeting the actual requirements in closed-loop simulations. Specifically, the true application demands the capability to render novel views that extend beyond the original trajectory (such as cross-lane views), which are challenging to capture in the real world. To address this, this paper presents a synthetic dataset for novel driving view synthesis evaluation, which is specifically designed for autonomous driving simulations. This unique dataset includes testing images captured by deviating from the training trajectory by meters. It comprises six sequences that cover various times and weather conditions. Each sequence contains training images, testing images, and their corresponding camera poses and intrinsic parameters. Leveraging this novel dataset, we establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multicamera settings. The experimental findings underscore the significant gap in current approaches, revealing their inadequate ability to fulfill the demanding prerequisites of cross-lane or closed-loop simulation.

Paper Structure

This paper contains 17 sections, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Our datasets encompass six distinct scenes, each involving the vehicle following an on-road trajectory. To generate training data for the cameras and LiDAR sensor, we sample $150$ waypoints along each trajectory. The trajectory is visually emphasized using the color red.
  • Figure 2: The composition of our training set and testing set. The training set consists of three RGB cameras ('front', 'left-front', and 'right-front') mounted on our vehicle lane. We sample image sequences along trajectories that run parallel to the vehicle's route for the test set. This encompasses four distinct test trajectories, each offset from the vehicle's trajectory by 0 meters, 1 meter, 2 meters, and 4 meters, respectively. The sampling interval for the test set is five times the sampling interval used for the training set.
  • Figure 3: Visualization of the rendered images using DC-Gaussian and EmerNeRF with different offsets (i.e. 0m, 2m, and 4m) in two scenes. The discriminated areas are highlighted, and the areas with better results are marked as $\square$, while worse results are marked as $\square$.
  • Figure 4: Visualization of the rendered images and depth maps using 3D-GS kerbl3Dgaussians, GaussianPro cheng2024gaussianpro, and PVG chen2023periodic with different offsets (i.e. 0m, 2m, and 4m) in two scenes. The discriminated areas are highlighted, and the areas with better results are marked as '$\square$'.
  • Figure 5: Use EmerNeRF yang2023emernerf as example. In the left column: we visualize the novel-view-synthesis results with different offsets and camera numbers, the discriminate areas are highlighted. In the right column, we show the performance improvement between 3-cameras and 1-camera settings using PSNR, SSIM, and LPIPS.
  • ...and 4 more figures