Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

Junhao Ge; Zuhong Liu; Longteng Fan; Yifan Jiang; Jiaqi Su; Yiming Li; Zhejun Zhang; Siheng Chen

Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

Junhao Ge, Zuhong Liu, Longteng Fan, Yifan Jiang, Jiaqi Su, Yiming Li, Zhejun Zhang, Siheng Chen

TL;DR

This work addresses the data scarcity challenge in end-to-end autonomous driving by introducing SceneCrafter, a high-fidelity, interactive simulator based on 3D Gaussian Splatting ($3DGS$) that supports synthetic data generation and closed-loop evaluation. It couples Scene Controller (adaptive traffic generation with route-based and trigger-based spawning and an Adaptive Kinematic Model with learnable parameters) with Scene Renderer (Gaussian-based background/foreground rendering, ground-height estimation, and directional shadows) to produce spatial-temporally consistent scenes. The framework enables two modes: synthetic data generation using an expert planner and closed-loop evaluation with a learned policy, enabling robust end-to-end model testing and training. Experiments on Waymo-scale data demonstrate realistic rendering, improved generalization when augmenting real data with synthetic logs (notably in limited-data regimes), and stronger closed-loop performance when models are fine-tuned with synthetic data. Overall, SceneCrafter offers a practical pathway to scalable, realistic synthetic data and rigorous end-to-end AD evaluation with potential impact on data efficiency and model robustness in real-world driving.

Abstract

End-to-end (E2E) autonomous driving (AD) models require diverse, high-quality data to perform well across various driving scenarios. However, collecting large-scale real-world data is expensive and time-consuming, making high-fidelity synthetic data essential for enhancing data diversity and model robustness. Existing driving simulators for synthetic data generation have significant limitations: game-engine-based simulators struggle to produce realistic sensor data, while NeRF-based and diffusion-based methods face efficiency challenges. Additionally, recent simulators designed for closed-loop evaluation provide limited interaction with other vehicles, failing to simulate complex real-world traffic dynamics. To address these issues, we introduce SceneCrafter, a realistic, interactive, and efficient AD simulator based on 3D Gaussian Splatting (3DGS). SceneCrafter not only efficiently generates realistic driving logs across diverse traffic scenarios but also enables robust closed-loop evaluation of end-to-end models. Experimental results demonstrate that SceneCrafter serves as both a reliable evaluation platform and a efficient data generator that significantly improves end-to-end model generalization.

Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

TL;DR

This work addresses the data scarcity challenge in end-to-end autonomous driving by introducing SceneCrafter, a high-fidelity, interactive simulator based on 3D Gaussian Splatting (

) that supports synthetic data generation and closed-loop evaluation. It couples Scene Controller (adaptive traffic generation with route-based and trigger-based spawning and an Adaptive Kinematic Model with learnable parameters) with Scene Renderer (Gaussian-based background/foreground rendering, ground-height estimation, and directional shadows) to produce spatial-temporally consistent scenes. The framework enables two modes: synthetic data generation using an expert planner and closed-loop evaluation with a learned policy, enabling robust end-to-end model testing and training. Experiments on Waymo-scale data demonstrate realistic rendering, improved generalization when augmenting real data with synthetic logs (notably in limited-data regimes), and stronger closed-loop performance when models are fine-tuned with synthetic data. Overall, SceneCrafter offers a practical pathway to scalable, realistic synthetic data and rigorous end-to-end AD evaluation with potential impact on data efficiency and model robustness in real-world driving.

Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

TL;DR

Abstract

Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)