PlanT 2.0: Exposing Biases and Structural Flaws in Closed-Loop Driving
Simon Gerstenecker, Andreas Geiger, Katrin Renz
TL;DR
The paper tackles robustness and generalization gaps in autonomous driving by systematically analyzing model failures on CARLA, arguing that current emphasis on benchmark performance masks biases and shortcut learning. PlanT 2.0 is a lightweight, object-centric planning transformer that extends PlanT with richer object inputs, an SD map BEV representation, expanded sensing range, and a decoupled output design; its input perturbability enables controlled failure analysis. The authors report state-of-the-art results on CARLA validation routes and benchmarks (e.g., $NDS=28.6$ on CARLA validation routes; strong performance on Bench2Drive and Longest6 v2) but identify systematic failures such as limited obstacle diversity, trajectory overfitting, and risk-prone shortcuts, underscoring data dependence. They advocate data-centric development with richer, more robust datasets and provide open-source code to facilitate ongoing bias/flaw analysis.
Abstract
Most recent work in autonomous driving has prioritized benchmark performance and methodological innovation over in-depth analysis of model failures, biases, and shortcut learning. This has led to incremental improvements without a deep understanding of the current failures. While it is straightforward to look at situations where the model fails, it is hard to understand the underlying reason. This motivates us to conduct a systematic study, where inputs to the model are perturbed and the predictions observed. We introduce PlanT 2.0, a lightweight, object-centric planning transformer designed for autonomous driving research in CARLA. The object-level representation enables controlled analysis, as the input can be easily perturbed (e.g., by changing the location or adding or removing certain objects), in contrast to sensor-based models. To tackle the scenarios newly introduced by the challenging CARLA Leaderboard 2.0, we introduce multiple upgrades to PlanT, achieving state-of-the-art performance on Longest6 v2, Bench2Drive, and the CARLA validation routes. Our analysis exposes insightful failures, such as a lack of scene understanding caused by low obstacle diversity, rigid expert behaviors leading to exploitable shortcuts, and overfitting to a fixed set of expert trajectories. Based on these findings, we argue for a shift toward data-centric development, with a focus on richer, more robust, and less biased datasets. We open-source our code and model at https://github.com/autonomousvision/plant2.
