Data Scaling Laws for End-to-End Autonomous Driving
Alexander Naumann, Xunjiang Gu, Tolga Dimlioglu, Mariusz Bojarski, Alperen Degirmenci, Alexander Popov, Devansh Bisla, Marco Pavone, Urs Müller, Boris Ivanovic
TL;DR
The study investigates data scaling laws for a simple end-to-end autonomous driving stack across 16–8192 hours of data, evaluating both open-loop and NVIDIA DRIVE Sim closed-loop performance. It adopts four scaling-law estimators (M1–M4) and an adaptive training schedule to analyze data efficiency and extrapolate data requirements for target improvements, with $T=15$ future waypoints (3 seconds at 5 Hz). The results show that concurrent scaling of data and model capacity accelerates performance, with clear differences across actions (lane keeping, lane changes, turns) and sensor setups (1 vs 3 cameras, ResNet-18 vs ResNet-50). However, closed-loop gains saturate earlier than open-loop improvements, revealing a sim-to-real gap likely due to covariate shift, and highlighting the need for closed-loop or corrective data strategies in deployment. The work provides data-driven guidance for resource allocation in end-to-end AV development and points to future work on additional modalities, temporal modeling, and advanced end-to-end architectures to bridge open- and closed-loop performance.
Abstract
Autonomous vehicle (AV) stacks have traditionally relied on decomposed approaches, with separate modules handling perception, prediction, and planning. However, this design introduces information loss during inter-module communication, increases computational overhead, and can lead to compounding errors. To address these challenges, recent works have proposed architectures that integrate all components into an end-to-end differentiable model, enabling holistic system optimization. This shift emphasizes data engineering over software integration, offering the potential to enhance system performance by simply scaling up training resources. In this work, we evaluate the performance of a simple end-to-end driving architecture on internal driving datasets ranging in size from 16 to 8192 hours with both open-loop metrics and closed-loop simulations. Specifically, we investigate how much additional training data is needed to achieve a target performance gain, e.g., a 5% improvement in motion prediction accuracy. By understanding the relationship between model performance and training dataset size, we aim to provide insights for data-driven decision-making in autonomous driving development.
