Table of Contents
Fetching ...

ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving

Zhiyu Zheng, Shaoyu Chen, Haoran Yin, Xinbang Zhang, Jialv Zou, Xinggang Wang, Qian Zhang, Lefei Zhang

TL;DR

The paper tackles the challenge of imbalanced spatio-temporal trajectory data in end-to-end autonomous driving by reframing trajectory prediction as residual learning against a deterministic inertial reference. It introduces Normalized Residual Trajectory Modeling (ResAD), which learns a context-driven residual $oldsymbol{r} = oldsymbol{ au}_{\mathrm{gt}} - oldsymbol{ au}_{\mathrm{ref}}$ using diffusion decoders; key innovations include Trajectory Residual Modeling (TRM), Point-wise Residual Normalization (PRNorm), and Inertial Reference Perturbation (IRP), plus a Multimodal Trajectory Ranker for selecting optimal outputs. On NAVSIM v1/v2 benchmarks, ResAD achieves state-of-the-art PDMS and EPDMS with only two denoising steps, demonstrating improved safety and planning reliability while remaining compute-efficient. These results suggest a scalable, interpretable framework that reduces reliance on spurious correlations and improves near-term safety-critical decisions in real-world driving. Code will be released to facilitate further research.

Abstract

End-to-end autonomous driving (E2EAD) systems, which learn to predict future trajectories directly from sensor data, are fundamentally challenged by the inherent spatio-temporal imbalance of trajectory data. This imbalance creates a significant optimization burden, causing models to learn spurious correlations instead of robust driving logic, while also prioritizing uncertain, distant predictions, thereby compromising immediate safety. To address these issues, we propose ResAD, a novel Normalized Residual Trajectory Modeling framework. Instead of predicting the future trajectory directly, our approach reframes and simplifies the learning task by predicting the residual deviation from a deterministic inertial reference. This inertial reference serves as a strong physical prior, compelling the model to move beyond simple pattern-matching and instead focus its capacity on learning the necessary, context-driven deviations (e.g., traffic rules, obstacles) from this default, inertially-guided path. To mitigate the optimization imbalance caused by uncertain, long-term horizons, ResAD further incorporates Point-wise Normalization of the predicted residual. This technique re-weights the optimization objective, preventing large-magnitude errors associated with distant, uncertain waypoints from dominating the learning signal. On the NAVSIM v1 and v2 benchmarks, ResAD achieves state-of-the-art results of 88.8 PDMS and 85.5 EPDMS with only two denoising steps, demonstrating that ResAD significantly simplifies the learning task and improves planning performance. The code will be released to facilitate further research.

ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving

TL;DR

The paper tackles the challenge of imbalanced spatio-temporal trajectory data in end-to-end autonomous driving by reframing trajectory prediction as residual learning against a deterministic inertial reference. It introduces Normalized Residual Trajectory Modeling (ResAD), which learns a context-driven residual using diffusion decoders; key innovations include Trajectory Residual Modeling (TRM), Point-wise Residual Normalization (PRNorm), and Inertial Reference Perturbation (IRP), plus a Multimodal Trajectory Ranker for selecting optimal outputs. On NAVSIM v1/v2 benchmarks, ResAD achieves state-of-the-art PDMS and EPDMS with only two denoising steps, demonstrating improved safety and planning reliability while remaining compute-efficient. These results suggest a scalable, interpretable framework that reduces reliance on spurious correlations and improves near-term safety-critical decisions in real-world driving. Code will be released to facilitate further research.

Abstract

End-to-end autonomous driving (E2EAD) systems, which learn to predict future trajectories directly from sensor data, are fundamentally challenged by the inherent spatio-temporal imbalance of trajectory data. This imbalance creates a significant optimization burden, causing models to learn spurious correlations instead of robust driving logic, while also prioritizing uncertain, distant predictions, thereby compromising immediate safety. To address these issues, we propose ResAD, a novel Normalized Residual Trajectory Modeling framework. Instead of predicting the future trajectory directly, our approach reframes and simplifies the learning task by predicting the residual deviation from a deterministic inertial reference. This inertial reference serves as a strong physical prior, compelling the model to move beyond simple pattern-matching and instead focus its capacity on learning the necessary, context-driven deviations (e.g., traffic rules, obstacles) from this default, inertially-guided path. To mitigate the optimization imbalance caused by uncertain, long-term horizons, ResAD further incorporates Point-wise Normalization of the predicted residual. This technique re-weights the optimization objective, preventing large-magnitude errors associated with distant, uncertain waypoints from dominating the learning signal. On the NAVSIM v1 and v2 benchmarks, ResAD achieves state-of-the-art results of 88.8 PDMS and 85.5 EPDMS with only two denoising steps, demonstrating that ResAD significantly simplifies the learning task and improves planning performance. The code will be released to facilitate further research.

Paper Structure

This paper contains 14 sections, 15 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The core motivation for ResAD: Addressing the Challenge of Imbalanced Trajectory Data.(a) Visualization of the longitudinal data distribution of dataset trajectories under three modeling strategies. Raw Trajectories exhibit significant mean drift and increasing variance, creating a planning horizon dilemma. Our Trajectory Residual centers the distribution, and the Normalized Residual further stabilizes the variance for a simpler, balanced learning objective. (b) Conceptual comparison. Existing methods learn complex raw trajectories directly, risking reliance on spurious correlations. Our ResAD simplifies the task by learning to predict only the necessary residual deviation from a strong physical prior (the inertial reference), focusing the model on context-aware corrections.
  • Figure 2: The proposed ResAD framework. Instead of predicting the entire trajectory, ResAD establishes a strong physical prior based on the vehicle's current state: the Inertial Reference. By applying Inertial Reference Perturbation, the framework generates a diverse set of initial intent hypotheses. Finally, the diffusion decoder (DiffDecoder), conditioned on these references, learns to predict the necessary Normalized Residuals. We highlight the 1st-ranked and 5th-ranked output trajectories, denote as Top-1 Traj and Top-5 Traj.
  • Figure 3: Visual comparison ofResAD. This figure compares the 20 trajectory candidates from ResAD and DiffusionDrive. DiffusionDrive relies on a static, context-agnostic vocabulary, often proposing infeasible or irrelevant trajectories (highlighted by red circles). In contrast, ResAD dynamically generates a set of context-aware trajectories via IRP. This demonstrates our method's more efficient multimodal exploration, which avoids wasting capacity on invalid options and leads to a higher multimodal planning quality ($\mathcal{P}_{m}$), as validated in Sec. \ref{['sec:abl']}.
  • Figure 4: PRNorm stabilizes optimization and accelerates convergence. This figure compares the training dynamics of ResAD with and without PRNorm.
  • Figure 5: The impact of Inertial Reference. We highlight the 1st-ranked and 5th-ranked trajectory with their corresponding IRs. Red arrows are used to highlight key examples of the predicted residual.
  • ...and 1 more figures