Table of Contents
Fetching ...

PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving

Zhili Chen, Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen

TL;DR

PPAD addresses the challenge of integrating prediction and planning in end-to-end autonomous driving by introducing an iterative, timestep-wise interaction between ego and surrounding agents under an autoregressive framework. It interleaves prediction and planning at every future step and employs hierarchical dynamic key objects attention to model ego-agent-environment interactions, including BEV features, map elements, and agent queries. The method trains with noisy trajectories and end-to-end losses, achieving state-of-the-art performance on nuScenes and Argoverse2 with improved L2 accuracy and reduced collision rates. This work highlights the importance of multi-step, bidirectional interaction for safer, more reliable autonomous driving and offers a scalable framework for future BEV-based end-to-end systems.

Abstract

We present a new interaction mechanism of prediction and planning for end-to-end autonomous driving, called PPAD (Iterative Interaction of Prediction and Planning Autonomous Driving), which considers the timestep-wise interaction to better integrate prediction and planning. An ego vehicle performs motion planning at each timestep based on the trajectory prediction of surrounding agents (e.g., vehicles and pedestrians) and its local road conditions. Unlike existing end-to-end autonomous driving frameworks, PPAD models the interactions among ego, agents, and the dynamic environment in an autoregressive manner by interleaving the Prediction and Planning processes at every timestep, instead of a single sequential process of prediction followed by planning. Specifically, we design ego-to-agent, ego-to-map, and ego-to-BEV interaction mechanisms with hierarchical dynamic key objects attention to better model the interactions. The experiments on the nuScenes benchmark show that our approach outperforms state-of-the-art methods.

PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving

TL;DR

PPAD addresses the challenge of integrating prediction and planning in end-to-end autonomous driving by introducing an iterative, timestep-wise interaction between ego and surrounding agents under an autoregressive framework. It interleaves prediction and planning at every future step and employs hierarchical dynamic key objects attention to model ego-agent-environment interactions, including BEV features, map elements, and agent queries. The method trains with noisy trajectories and end-to-end losses, achieving state-of-the-art performance on nuScenes and Argoverse2 with improved L2 accuracy and reduced collision rates. This work highlights the importance of multi-step, bidirectional interaction for safer, more reliable autonomous driving and offers a scalable framework for future BEV-based end-to-end systems.

Abstract

We present a new interaction mechanism of prediction and planning for end-to-end autonomous driving, called PPAD (Iterative Interaction of Prediction and Planning Autonomous Driving), which considers the timestep-wise interaction to better integrate prediction and planning. An ego vehicle performs motion planning at each timestep based on the trajectory prediction of surrounding agents (e.g., vehicles and pedestrians) and its local road conditions. Unlike existing end-to-end autonomous driving frameworks, PPAD models the interactions among ego, agents, and the dynamic environment in an autoregressive manner by interleaving the Prediction and Planning processes at every timestep, instead of a single sequential process of prediction followed by planning. Specifically, we design ego-to-agent, ego-to-map, and ego-to-BEV interaction mechanisms with hierarchical dynamic key objects attention to better model the interactions. The experiments on the nuScenes benchmark show that our approach outperforms state-of-the-art methods.
Paper Structure (18 sections, 8 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 18 sections, 8 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: A high-level illustration of our proposed PPAD framework. The agent (in blue) intends to drive straight, while the ego (in red) plans to change lanes. Fig. \ref{['fig:coreIdea']}(a) presents the typical one-shot method that might result in invalid motion plans and lead to an accident because of a lack of in-depth interactions. Fig. \ref{['fig:coreIdea']}(b) demonstrates the gaming process between the ego and the agent under the PPAD architecture. During the prediction process, the agent executes an assertive plan by accelerating to stop the ego from blocking its route. The planning process of the ego plans trajectory based on the previous prediction process of the agent. The ego decelerates to avoid a potential accident and then changes lanes to achieve its driving goal.
  • Figure 2: Overall architecture of our proposed self-driving framework, PPAD. It consists of the Perception Transformer and the Iterative Prediction-Planning Module. The Perception Transformer encodes scene contexts into agent queries, map queries, and BEV queries. Then, the Prediction-Planning Module interleaves the processes of the agent motion prediction and the ego planning for $N$ times. Throughout the iterative Prediction and Planning processes, in-depth interactions are conducted among the ego, agents, map elements, and BEV features. In the Prediction process, the agent initially intends to go straight and is unaware of the potential motion of the ego. After interacting with the ego, map elements, and BEV features, the agent plans to be assertive and proceeds to accelerate. In the following Planning process, the ego knows the agent will accelerate through interacting with the updated agent query. It eventually plans to decelerate first and then conduct the lane change for safety reasons.
  • Figure 3: Qualitative results of PPAD. The green box in the figure demonstrates the ego agent, while the red ones are agents.
  • Figure : Pseudo code of Key objects attention in a PyTorch-like style.