Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning
Bozhou Zhang, Nan Song, Xin Jin, Li Zhang
TL;DR
BridgeAD addresses the limited use of historical information in end-to-end autonomous driving by reformulating motion and planning queries as multi-step, time-step-specific constructs that are integrated with perception and planning through history-aware modules. A memory queue stores past motion and planning queries, enabling step-wise interactions via cross-attention and dual self-attention layers, while a step-level Mot2Plan interaction enforces cross-time-step consistency. The framework achieves state-of-the-art open-loop and superior closed-loop performance on nuScenes/NeuroNCAP benchmarks, with notable improvements in perception, motion prediction, and planning coherence. By bridging past and future through history-enhanced perception and planning, BridgeAD offers a cohesive, scalable approach to safer, more reliable end-to-end autonomous driving.
Abstract
End-to-end autonomous driving unifies tasks in a differentiable framework, enabling planning-oriented optimization and attracting growing attention. Current methods aggregate historical information either through dense historical bird's-eye-view (BEV) features or by querying a sparse memory bank, following paradigms inherited from detection. However, we argue that these paradigms either omit historical information in motion planning or fail to align with its multi-step nature, which requires predicting or planning multiple future time steps. In line with the philosophy of future is a continuation of past, we propose BridgeAD, which reformulates motion and planning queries as multi-step queries to differentiate the queries for each future time step. This design enables the effective use of historical prediction and planning by applying them to the appropriate parts of the end-to-end system based on the time steps, which improves both perception and motion planning. Specifically, historical queries for the current frame are combined with perception, while queries for future frames are integrated with motion planning. In this way, we bridge the gap between past and future by aggregating historical insights at every time step, enhancing the overall coherence and accuracy of the end-to-end autonomous driving pipeline. Extensive experiments on the nuScenes dataset in both open-loop and closed-loop settings demonstrate that BridgeAD achieves state-of-the-art performance.
