Addressing the Waypoint-Action Gap in End-to-End Autonomous Driving via Vehicle Motion Models
Jorge Daniel Rodríguez-Vidal, Gabriel Villalonga, Diego Porres, Antonio M. López Peña
TL;DR
This work addresses the waypoint--action gap in end-to-end autonomous driving by introducing a differentiable vehicle-model framework that lifts action sequences into ego-frame waypoint trajectories via a modular three-component operator, $\mathcal{F}_\phi$, enabling action-based policies to train and be evaluated within waypoint-based benchmarks. Instantiations with the Kinematic Bicycle Model ($\mathcal{F}_{\text{KBM}}$) and a Continuous Curvature Path Planner ($\mathcal{F}_{\text{CCPP}}$), along with an MLP option, provide a unified, differentiable bridge that preserves dynamics while supporting gradient-based training. End-to-end training uses a waypoint loss $L_{\mathrm{wp}}$, allowing gradients to flow through the lifting operator and improving correlation with driving outcomes across NAVSIM, Bench2Drive, and CARLA benchmarks. Across navhard, navtest, Bench2Drive, and CARLA, the framework achieves state-of-the-art or near-state-of-the-art results for action-based policies and demonstrates improved training stability and stronger offline–online correlations, highlighting the practical impact of integrating differentiable vehicle models into action-based E2E driving.
Abstract
End-to-End Autonomous Driving (E2E-AD) systems are typically grouped by the nature of their outputs: (i) waypoint-based models that predict a future trajectory, and (ii) action-based models that directly output throttle, steer and brake. Most recent benchmark protocols and training pipelines are waypoint-based, which makes action-based policies harder to train and compare, slowing their progress. To bridge this waypoint-action gap, we propose a novel, differentiable vehicle-model framework that rolls out predicted action sequences to their corresponding ego-frame waypoint trajectories while supervising in waypoint space. Our approach enables action-based architectures to be trained and evaluated, for the first time, within waypoint-based benchmarks without modifying the underlying evaluation protocol. We extensively evaluate our framework across multiple challenging benchmarks and observe consistent improvements over the baselines. In particular, on NAVSIM \texttt{navhard} our approach achieves state-of-the-art performance. Our code will be made publicly available upon acceptance.
