Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving
Ziying Song, Caiyan Jia, Lin Liu, Hongyu Pan, Yongchang Zhang, Junming Wang, Xingyu Zhang, Shaoqing Xu, Lei Yang, Yadan Luo
TL;DR
MomAD addresses temporal instability in end-to-end autonomous driving by introducing momentum-aware planning, combining trajectory momentum through Topological Trajectory Matching (TTM) and perception momentum via the Momentum Planning Interactor (MPI). TTM uses the Hausdorff distance $d_H$ to align current trajectory proposals with the past path, while MPI cross-attends the selected plan with historical queries to enrich long-horizon context; a Robust Instance Denoising module and a Trajectory Prediction Consistency (TPC) metric quantify and improve planning stability. Across nuScenes, Turning-nuScenes, and Bench2Drive, MomAD achieves state-of-the-art planning metrics, enhances long-horizon consistency (≥3 s), reduces collision rates (e.g., 26% in Turning-nuScenes at 6 s), and improves perception–motion prediction metrics, demonstrating robust performance under occlusions and dynamic conditions. The work introduces Turning-nuScenes and the TPC metric to better evaluate temporal consistency, and discusses limitations such as mode collapse under teacher forcing with future work exploring diffusion-based decoding for increased trajectory diversity.
Abstract
End-to-end autonomous driving frameworks enable seamless integration of perception and planning but often rely on one-shot trajectory prediction, which may lead to unstable control and vulnerability to occlusions in single-frame perception. To address this, we propose the Momentum-Aware Driving (MomAD) framework, which introduces trajectory momentum and perception momentum to stabilize and refine trajectory predictions. MomAD comprises two core components: (1) Topological Trajectory Matching (TTM) employs Hausdorff Distance to select the optimal planning query that aligns with prior paths to ensure coherence;(2) Momentum Planning Interactor (MPI) cross-attends the selected planning query with historical queries to expand static and dynamic perception files. This enriched query, in turn, helps regenerate long-horizon trajectory and reduce collision risks. To mitigate noise arising from dynamic environments and detection errors, we introduce robust instance denoising during training, enabling the planning model to focus on critical signals and improve its robustness. We also propose a novel Trajectory Prediction Consistency (TPC) metric to quantitatively assess planning stability. Experiments on the nuScenes dataset demonstrate that MomAD achieves superior long-term consistency (>=3s) compared to SOTA methods. Moreover, evaluations on the curated Turning-nuScenes shows that MomAD reduces the collision rate by 26% and improves TPC by 0.97m (33.45%) over a 6s prediction horizon, while closedloop on Bench2Drive demonstrates an up to 16.3% improvement in success rate.
