Table of Contents
Fetching ...

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

Bozhou Zhang, Nan Song, Xin Jin, Li Zhang

TL;DR

BridgeAD addresses the limited use of historical information in end-to-end autonomous driving by reformulating motion and planning queries as multi-step, time-step-specific constructs that are integrated with perception and planning through history-aware modules. A memory queue stores past motion and planning queries, enabling step-wise interactions via cross-attention and dual self-attention layers, while a step-level Mot2Plan interaction enforces cross-time-step consistency. The framework achieves state-of-the-art open-loop and superior closed-loop performance on nuScenes/NeuroNCAP benchmarks, with notable improvements in perception, motion prediction, and planning coherence. By bridging past and future through history-enhanced perception and planning, BridgeAD offers a cohesive, scalable approach to safer, more reliable end-to-end autonomous driving.

Abstract

End-to-end autonomous driving unifies tasks in a differentiable framework, enabling planning-oriented optimization and attracting growing attention. Current methods aggregate historical information either through dense historical bird's-eye-view (BEV) features or by querying a sparse memory bank, following paradigms inherited from detection. However, we argue that these paradigms either omit historical information in motion planning or fail to align with its multi-step nature, which requires predicting or planning multiple future time steps. In line with the philosophy of future is a continuation of past, we propose BridgeAD, which reformulates motion and planning queries as multi-step queries to differentiate the queries for each future time step. This design enables the effective use of historical prediction and planning by applying them to the appropriate parts of the end-to-end system based on the time steps, which improves both perception and motion planning. Specifically, historical queries for the current frame are combined with perception, while queries for future frames are integrated with motion planning. In this way, we bridge the gap between past and future by aggregating historical insights at every time step, enhancing the overall coherence and accuracy of the end-to-end autonomous driving pipeline. Extensive experiments on the nuScenes dataset in both open-loop and closed-loop settings demonstrate that BridgeAD achieves state-of-the-art performance.

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

TL;DR

BridgeAD addresses the limited use of historical information in end-to-end autonomous driving by reformulating motion and planning queries as multi-step, time-step-specific constructs that are integrated with perception and planning through history-aware modules. A memory queue stores past motion and planning queries, enabling step-wise interactions via cross-attention and dual self-attention layers, while a step-level Mot2Plan interaction enforces cross-time-step consistency. The framework achieves state-of-the-art open-loop and superior closed-loop performance on nuScenes/NeuroNCAP benchmarks, with notable improvements in perception, motion prediction, and planning coherence. By bridging past and future through history-enhanced perception and planning, BridgeAD offers a cohesive, scalable approach to safer, more reliable end-to-end autonomous driving.

Abstract

End-to-end autonomous driving unifies tasks in a differentiable framework, enabling planning-oriented optimization and attracting growing attention. Current methods aggregate historical information either through dense historical bird's-eye-view (BEV) features or by querying a sparse memory bank, following paradigms inherited from detection. However, we argue that these paradigms either omit historical information in motion planning or fail to align with its multi-step nature, which requires predicting or planning multiple future time steps. In line with the philosophy of future is a continuation of past, we propose BridgeAD, which reformulates motion and planning queries as multi-step queries to differentiate the queries for each future time step. This design enables the effective use of historical prediction and planning by applying them to the appropriate parts of the end-to-end system based on the time steps, which improves both perception and motion planning. Specifically, historical queries for the current frame are combined with perception, while queries for future frames are integrated with motion planning. In this way, we bridge the gap between past and future by aggregating historical insights at every time step, enhancing the overall coherence and accuracy of the end-to-end autonomous driving pipeline. Extensive experiments on the nuScenes dataset in both open-loop and closed-loop settings demonstrate that BridgeAD achieves state-of-the-art performance.

Paper Structure

This paper contains 56 sections, 7 equations, 11 figures, 16 tables.

Figures (11)

  • Figure 1: The primary distinction between previous methods and ours lies in how historical information is aggregated. As depicted in (a), previous methods either interact with historical BEV features within the perception module or utilize a historical query memory bank. As shown in (b), our BridgeAD enhances end-to-end autonomous driving by incorporating historical prediction for the current frame into the perception module and historical prediction and planning for future frames into the motion planning module.
  • Figure 2: Overview of the BridgeAD framework: Multi-view images are first processed by the Image Encoder, after which both 3D objects and the vectorized map are perceived. (a) The memory queue caches $K$ past frames of historical motion and planning queries. (b) The Historical Mot2Det Fusion Module is proposed to enhance detection and tracking by leveraging historical motion queries for the current frame. In the motion planning component, (c) the History-Enhanced Motion Prediction Module and (d) the History-Enhanced Planning Module aggregate multi-step historical motion and planning queries into queries for the future frames. Finally, (e) the Step-Level Mot2Plan Interaction Module facilitates interaction between multi-step motion queries and planning queries for corresponding future time steps.
  • Figure 3: Qualitative results in the open-loop evaluation show that our BridgeAD accurately produces planning outputs.
  • Figure 4: Qualitative results in the closed-loop evaluation demonstrate that our BridgeAD effectively avoids collisions in safety-critical scenarios.
  • Figure 5: Further explanation about our BridgeAD.
  • ...and 6 more figures