Table of Contents
Fetching ...

Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

Junhao Ge, Zuhong Liu, Longteng Fan, Yifan Jiang, Jiaqi Su, Yiming Li, Zhejun Zhang, Siheng Chen

TL;DR

This work addresses the data scarcity challenge in end-to-end autonomous driving by introducing SceneCrafter, a high-fidelity, interactive simulator based on 3D Gaussian Splatting ($3DGS$) that supports synthetic data generation and closed-loop evaluation. It couples Scene Controller (adaptive traffic generation with route-based and trigger-based spawning and an Adaptive Kinematic Model with learnable parameters) with Scene Renderer (Gaussian-based background/foreground rendering, ground-height estimation, and directional shadows) to produce spatial-temporally consistent scenes. The framework enables two modes: synthetic data generation using an expert planner and closed-loop evaluation with a learned policy, enabling robust end-to-end model testing and training. Experiments on Waymo-scale data demonstrate realistic rendering, improved generalization when augmenting real data with synthetic logs (notably in limited-data regimes), and stronger closed-loop performance when models are fine-tuned with synthetic data. Overall, SceneCrafter offers a practical pathway to scalable, realistic synthetic data and rigorous end-to-end AD evaluation with potential impact on data efficiency and model robustness in real-world driving.

Abstract

End-to-end (E2E) autonomous driving (AD) models require diverse, high-quality data to perform well across various driving scenarios. However, collecting large-scale real-world data is expensive and time-consuming, making high-fidelity synthetic data essential for enhancing data diversity and model robustness. Existing driving simulators for synthetic data generation have significant limitations: game-engine-based simulators struggle to produce realistic sensor data, while NeRF-based and diffusion-based methods face efficiency challenges. Additionally, recent simulators designed for closed-loop evaluation provide limited interaction with other vehicles, failing to simulate complex real-world traffic dynamics. To address these issues, we introduce SceneCrafter, a realistic, interactive, and efficient AD simulator based on 3D Gaussian Splatting (3DGS). SceneCrafter not only efficiently generates realistic driving logs across diverse traffic scenarios but also enables robust closed-loop evaluation of end-to-end models. Experimental results demonstrate that SceneCrafter serves as both a reliable evaluation platform and a efficient data generator that significantly improves end-to-end model generalization.

Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

TL;DR

This work addresses the data scarcity challenge in end-to-end autonomous driving by introducing SceneCrafter, a high-fidelity, interactive simulator based on 3D Gaussian Splatting () that supports synthetic data generation and closed-loop evaluation. It couples Scene Controller (adaptive traffic generation with route-based and trigger-based spawning and an Adaptive Kinematic Model with learnable parameters) with Scene Renderer (Gaussian-based background/foreground rendering, ground-height estimation, and directional shadows) to produce spatial-temporally consistent scenes. The framework enables two modes: synthetic data generation using an expert planner and closed-loop evaluation with a learned policy, enabling robust end-to-end model testing and training. Experiments on Waymo-scale data demonstrate realistic rendering, improved generalization when augmenting real data with synthetic logs (notably in limited-data regimes), and stronger closed-loop performance when models are fine-tuned with synthetic data. Overall, SceneCrafter offers a practical pathway to scalable, realistic synthetic data and rigorous end-to-end AD evaluation with potential impact on data efficiency and model robustness in real-world driving.

Abstract

End-to-end (E2E) autonomous driving (AD) models require diverse, high-quality data to perform well across various driving scenarios. However, collecting large-scale real-world data is expensive and time-consuming, making high-fidelity synthetic data essential for enhancing data diversity and model robustness. Existing driving simulators for synthetic data generation have significant limitations: game-engine-based simulators struggle to produce realistic sensor data, while NeRF-based and diffusion-based methods face efficiency challenges. Additionally, recent simulators designed for closed-loop evaluation provide limited interaction with other vehicles, failing to simulate complex real-world traffic dynamics. To address these issues, we introduce SceneCrafter, a realistic, interactive, and efficient AD simulator based on 3D Gaussian Splatting (3DGS). SceneCrafter not only efficiently generates realistic driving logs across diverse traffic scenarios but also enables robust closed-loop evaluation of end-to-end models. Experimental results demonstrate that SceneCrafter serves as both a reliable evaluation platform and a efficient data generator that significantly improves end-to-end model generalization.

Paper Structure

This paper contains 28 sections, 11 equations, 13 figures, 8 tables, 1 algorithm.

Figures (13)

  • Figure 1: SceneCrafter is a high-fidelity simulator capable of generating realistic synthetic driving data and providing an effective closed-loop evaluation for end-to-end autonomous driving models. Given real-world datasets, SceneCrafter can reconstruct dynamic driving scenes and incorporate interactive traffic flows to generate novel, realistic, and consistent scenarios.
  • Figure 2: Overview of SceneCrafter framework. Simulation is initialized with world configs, map topology and camera configs. Scene Controller updates interactive traffic flow, based on which Scene Renderer generates realistic driving scenes. End-to-End Driving Model plans future ego trajectory depending on simulation modes including synthetic data generation and closed-loop evaluation.
  • Figure 3: (a) illustrates the original scenario, where no vehicles are present except ego vehicle, which simply maintains a constant speed. In contrast, (b), (c), and (d) demonstrate how ego vehicle's behavior is altered due to the influence of the blocking agent, while ego vehicle impact the surrounding agents. Example video is given in supplementary material.
  • Figure 4: (a) and (b) show that ego trajectories generated by AKM are smoother and more realistic than those by BM. (c) and (d) illustrate that end-to-end planning result deviates more from the simulated GT trajectory with AKM than with BM, which indicates AKM aligns better with real world dynamics. The color bar indicates vehicle velocity norms.
  • Figure 5: Qualitative comparison from front camera view of the original end-to-end models and those fine-tuned with our synthetic data. The Red car represents ego vehicle controlled by $\pi_{AD}$. The red and green arrows respectively indicate the locations where the collision occurs or is avoided. Fine-tuned models offer more reasonable driving behaviors in complex scenarios while models solely trained on real data, showcasing enhanced generalization.
  • ...and 8 more figures