Table of Contents
Fetching ...

Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

Ziying Song, Caiyan Jia, Lin Liu, Hongyu Pan, Yongchang Zhang, Junming Wang, Xingyu Zhang, Shaoqing Xu, Lei Yang, Yadan Luo

TL;DR

MomAD addresses temporal instability in end-to-end autonomous driving by introducing momentum-aware planning, combining trajectory momentum through Topological Trajectory Matching (TTM) and perception momentum via the Momentum Planning Interactor (MPI). TTM uses the Hausdorff distance $d_H$ to align current trajectory proposals with the past path, while MPI cross-attends the selected plan with historical queries to enrich long-horizon context; a Robust Instance Denoising module and a Trajectory Prediction Consistency (TPC) metric quantify and improve planning stability. Across nuScenes, Turning-nuScenes, and Bench2Drive, MomAD achieves state-of-the-art planning metrics, enhances long-horizon consistency (≥3 s), reduces collision rates (e.g., 26% in Turning-nuScenes at 6 s), and improves perception–motion prediction metrics, demonstrating robust performance under occlusions and dynamic conditions. The work introduces Turning-nuScenes and the TPC metric to better evaluate temporal consistency, and discusses limitations such as mode collapse under teacher forcing with future work exploring diffusion-based decoding for increased trajectory diversity.

Abstract

End-to-end autonomous driving frameworks enable seamless integration of perception and planning but often rely on one-shot trajectory prediction, which may lead to unstable control and vulnerability to occlusions in single-frame perception. To address this, we propose the Momentum-Aware Driving (MomAD) framework, which introduces trajectory momentum and perception momentum to stabilize and refine trajectory predictions. MomAD comprises two core components: (1) Topological Trajectory Matching (TTM) employs Hausdorff Distance to select the optimal planning query that aligns with prior paths to ensure coherence;(2) Momentum Planning Interactor (MPI) cross-attends the selected planning query with historical queries to expand static and dynamic perception files. This enriched query, in turn, helps regenerate long-horizon trajectory and reduce collision risks. To mitigate noise arising from dynamic environments and detection errors, we introduce robust instance denoising during training, enabling the planning model to focus on critical signals and improve its robustness. We also propose a novel Trajectory Prediction Consistency (TPC) metric to quantitatively assess planning stability. Experiments on the nuScenes dataset demonstrate that MomAD achieves superior long-term consistency (>=3s) compared to SOTA methods. Moreover, evaluations on the curated Turning-nuScenes shows that MomAD reduces the collision rate by 26% and improves TPC by 0.97m (33.45%) over a 6s prediction horizon, while closedloop on Bench2Drive demonstrates an up to 16.3% improvement in success rate.

Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

TL;DR

MomAD addresses temporal instability in end-to-end autonomous driving by introducing momentum-aware planning, combining trajectory momentum through Topological Trajectory Matching (TTM) and perception momentum via the Momentum Planning Interactor (MPI). TTM uses the Hausdorff distance to align current trajectory proposals with the past path, while MPI cross-attends the selected plan with historical queries to enrich long-horizon context; a Robust Instance Denoising module and a Trajectory Prediction Consistency (TPC) metric quantify and improve planning stability. Across nuScenes, Turning-nuScenes, and Bench2Drive, MomAD achieves state-of-the-art planning metrics, enhances long-horizon consistency (≥3 s), reduces collision rates (e.g., 26% in Turning-nuScenes at 6 s), and improves perception–motion prediction metrics, demonstrating robust performance under occlusions and dynamic conditions. The work introduces Turning-nuScenes and the TPC metric to better evaluate temporal consistency, and discusses limitations such as mode collapse under teacher forcing with future work exploring diffusion-based decoding for increased trajectory diversity.

Abstract

End-to-end autonomous driving frameworks enable seamless integration of perception and planning but often rely on one-shot trajectory prediction, which may lead to unstable control and vulnerability to occlusions in single-frame perception. To address this, we propose the Momentum-Aware Driving (MomAD) framework, which introduces trajectory momentum and perception momentum to stabilize and refine trajectory predictions. MomAD comprises two core components: (1) Topological Trajectory Matching (TTM) employs Hausdorff Distance to select the optimal planning query that aligns with prior paths to ensure coherence;(2) Momentum Planning Interactor (MPI) cross-attends the selected planning query with historical queries to expand static and dynamic perception files. This enriched query, in turn, helps regenerate long-horizon trajectory and reduce collision risks. To mitigate noise arising from dynamic environments and detection errors, we introduce robust instance denoising during training, enabling the planning model to focus on critical signals and improve its robustness. We also propose a novel Trajectory Prediction Consistency (TPC) metric to quantitatively assess planning stability. Experiments on the nuScenes dataset demonstrate that MomAD achieves superior long-term consistency (>=3s) compared to SOTA methods. Moreover, evaluations on the curated Turning-nuScenes shows that MomAD reduces the collision rate by 26% and improves TPC by 0.97m (33.45%) over a 6s prediction horizon, while closedloop on Bench2Drive demonstrates an up to 16.3% improvement in success rate.

Paper Structure

This paper contains 21 sections, 13 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: (a)Deterministic Planninguniadjiang2023vadfusionadST_P3 predicts deterministic trajectories, but lacks action diversity, posing safety risks. (b)Multi-modal Trajectory Planningsun2024sparsedrivechen2024vadv2hedrive_zhangxingyu selects the highest-scoring trajectory among the multi-modal trajectories, yet fails to ensure stability and consistency, having risks in vehicle trembling. (c)Momentum Planning leverages the trajectory and perception momentum to enhance current planning through historical guidance to overcome temporal inconsistency.
  • Figure 2: The overall architecture of MomAD. MomAD, as a multi-modal trajectory end-to-end autonomous driving method, first encodes multi-view images into feature maps, then learns a sparse scene representation through a robust instance denoising via perturbation module, and finally performs a momentum planning through Topological Trajectory Matching (TTM) module and Momentum Planning Interactor (MPI) module to accomplish planning tasks. Our approach addresses critical challenges of stability and robustness in dynamic driving conditions.
  • Figure 3: The illustration of Momentum Planning Interactor (MPI). MPI cross-attends a selected planning query with historical queries to expand static and dynamic perception files, resulting in an enriched query that improves long-horizon trajectory generation and reduces collision risks.
  • Figure 4: Visualization results of MomAD compared with UniAD, VAD and SparseDrive across multiple frames. The proposed MomAD achieves temporal consistency whichever from the predicted trajectory compared with ground truth (GT) or from the TPC metric.
  • Figure A1: Visualization of turning scenarios in the Turning-nuScenes dataset. "LIDAR_TOP" represents the visualization of the corresponding scene from BEV. While "CAMERA_FRONT" refers to the images captured by the front camera of the ego vehicle in the respective scene.
  • ...and 3 more figures