Hidden Biases of End-to-End Driving Models
Bernhard Jaeger, Kashyap Chitta, Andreas Geiger
TL;DR
The paper investigates why end-to-end driving methods improve on CARLA by identifying two recurrent biases: a lateral recovery shortcut tied to target-point conditioning and the multi-modal nature of future velocities rendered as waypoints. It demonstrates that a transformer-based pooling mechanism and data augmentations can mitigate shortcut effects, and proposes disentangling target speeds from path predictions with a confidence-weighted controller to handle uncertainty. Building on these insights,TF++ (TransFuser++) combines architectural refinements, two-stage training, and dataset scaling to achieve state-of-the-art results on Longest6 and LAV benchmarks, while reducing data requirements. The work highlights the importance of understanding architectural biases and representation ambiguities for robust, interpretable end-to-end driving systems, and it discusses limitations and broader implications for real-world deployment.
Abstract
End-to-end driving systems have recently made rapid progress, in particular on CARLA. Independent of their major contribution, they introduce changes to minor system components. Consequently, the source of improvements is unclear. We identify two biases that recur in nearly all state-of-the-art methods and are critical for the observed progress on CARLA: (1) lateral recovery via a strong inductive bias towards target point following, and (2) longitudinal averaging of multimodal waypoint predictions for slowing down. We investigate the drawbacks of these biases and identify principled alternatives. By incorporating our insights, we develop TF++, a simple end-to-end method that ranks first on the Longest6 and LAV benchmarks, gaining 11 driving score over the best prior work on Longest6.
