Table of Contents
Fetching ...

RAPiD: Real-time Deterministic Trajectory Planning via Diffusion Behavior Priors for Safe and Efficient Autonomous Driving

Ruturaj Reddy, Hrishav Bakul Barua, Junn Yong Loo, Thanh Thi Nguyen, Ganesh Krishnasamy

TL;DR

RAPiD tackles the latency and safety challenges of diffusion-based trajectory planning by distilling a pretrained diffusion planner into a deterministic policy. It employs Score Regularised Policy Optimization (SRPO) to regularize policy learning with the diffusion prior’s score, and trains a safety-focused critic via Implicit Q-Learning using the Predictive Driver Model (PDM) scorer. The approach yields an 8× speedup over diffusion baselines and achieves state-of-the-art generalization among learning-based planners on interPlan, while maintaining strong safety guarantees through PDM-based supervision. This work demonstrates a practical path for real-time autonomous driving deployment by combining the expressive power of diffusion models with the efficiency of deterministic policies. The RAPiD framework thus bridges diffusion model expressiveness and production-time latency constraints, enabling safer and faster decision-making in complex traffic scenarios.

Abstract

Diffusion-based trajectory planners have demonstrated strong capability for modeling the multimodal nature of human driving behavior, but their reliance on iterative stochastic sampling poses critical challenges for real-time, safety-critical deployment. In this work, we present RAPiD, a deterministic policy extraction framework that distills a pretrained diffusion-based planner into an efficient policy while eliminating diffusion sampling. Using score-regularized policy optimization, we leverage the score function of a pre-trained diffusion planner as a behavior prior to regularize policy learning. To promote safety and passenger comfort, the policy is optimized using a critic trained to imitate a predictive driver controller, providing dense, safety-focused supervision beyond conventional imitation learning. Evaluations demonstrate that RAPiD achieves competitive performance on closed-loop nuPlan scenarios with an 8x speedup over diffusion baselines, while achieving state-of-the-art generalization among learning-based planners on the interPlan benchmark. The official website of this work is: https://github.com/ruturajreddy/RAPiD.

RAPiD: Real-time Deterministic Trajectory Planning via Diffusion Behavior Priors for Safe and Efficient Autonomous Driving

TL;DR

RAPiD tackles the latency and safety challenges of diffusion-based trajectory planning by distilling a pretrained diffusion planner into a deterministic policy. It employs Score Regularised Policy Optimization (SRPO) to regularize policy learning with the diffusion prior’s score, and trains a safety-focused critic via Implicit Q-Learning using the Predictive Driver Model (PDM) scorer. The approach yields an 8× speedup over diffusion baselines and achieves state-of-the-art generalization among learning-based planners on interPlan, while maintaining strong safety guarantees through PDM-based supervision. This work demonstrates a practical path for real-time autonomous driving deployment by combining the expressive power of diffusion models with the efficiency of deterministic policies. The RAPiD framework thus bridges diffusion model expressiveness and production-time latency constraints, enabling safer and faster decision-making in complex traffic scenarios.

Abstract

Diffusion-based trajectory planners have demonstrated strong capability for modeling the multimodal nature of human driving behavior, but their reliance on iterative stochastic sampling poses critical challenges for real-time, safety-critical deployment. In this work, we present RAPiD, a deterministic policy extraction framework that distills a pretrained diffusion-based planner into an efficient policy while eliminating diffusion sampling. Using score-regularized policy optimization, we leverage the score function of a pre-trained diffusion planner as a behavior prior to regularize policy learning. To promote safety and passenger comfort, the policy is optimized using a critic trained to imitate a predictive driver controller, providing dense, safety-focused supervision beyond conventional imitation learning. Evaluations demonstrate that RAPiD achieves competitive performance on closed-loop nuPlan scenarios with an 8x speedup over diffusion baselines, while achieving state-of-the-art generalization among learning-based planners on the interPlan benchmark. The official website of this work is: https://github.com/ruturajreddy/RAPiD.
Paper Structure (20 sections, 9 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 9 equations, 13 figures, 3 tables, 1 algorithm.

Figures (13)

  • Figure 1: Figures (a-1) and (a-2) illustrate the limitations of the baseline DiffusionPlanner. Its computationally intensive, stochastic sampling (100ms latency) causes a delayed reaction, resulting in a collision. In contrast, Figures (b-1) and (b-2) demonstrate our proposed RAPiD framework. By distilling the diffusion prior into a deterministic policy trained on PDM safety metrics rather than nuPlan metrics, RAPiD achieves $8\times$ faster inference. This efficiency enables timely decisions, resulting in a smooth, collision-free maneuver.
  • Figure 2: Overview of the proposed framework (Section \ref{['sec:method']}). Stage 1 involves offline replay buffer construction, where raw sensor data is processed by a frozen DiffusionPlanner encoder to generate rich latent state embeddings ($s$). Ground truth trajectories are evaluated by the PDM Scorer to assign rewards ($r$) based on safety and comfort metrics, creating a scored dataset. Stage 2 focuses on Critic Training via Implicit Q-Learning. The critic learns to estimate the Q-value $\mathcal{L}_Q(\phi)$ by evaluating ground truth trajectories ($a$) conditioned on the frozen embeddings ($s$), effectively distinguishing between safe (high PDM score) and unsafe behaviors. Stage 3 performs Deterministic Policy Training. A Transformer-based policy is distilled using the surrogate loss gradient $\nabla_{\theta}\mathcal{L}_{\pi}^{surr}(\theta)$. This gradient fuses the Critic's guidance (maximizing safety rewards) with the Frozen Diffusion Prior's score function to regularize the policy toward realistic manifolds, enabling fast, one-step deterministic inference.
  • Figure 3: Qualitative results across four scenarios in nuPlan: (a) Following Lane, (b) Stopping with Lead, (c) Starting Right Turn, and (d) Low Speed Maneuvering. Each figure displays the generated trajectories for DiffusionPlanner and RAPiD (top) along with their corresponding Comfort metrics (bottom) including acceleration, speed, and yaw rate demonstrating the smoother control exerted by RAPiD compared to the baseline DiffusionPlanner.
  • Figure 4: Breakdown of PDM scorer sub-metrics across (a) Val14, (b) Test14, and (c) Test14-Hard splits. RAPiD (Green) consistently outperforms DiffusionPlanner (Red) in critical safety metrics (No Collision, TTC) while trading Progress for safer maneuvers. Metric Definitions:No Col.: No Collision, Driv. Area: Drivable Area, Speed Lim.: Speed Limit, TTC: Time-to-Collision, Lane Foll.: Lane Following.
  • Figure S1: Supplementary overview of PDM metrics.
  • ...and 8 more figures