Table of Contents
Fetching ...

LAP: Fast LAtent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving

Jinhao Zhang, Wenlong Xia, Zhexuan Zhou, Youmin Gong, Jie Mei

TL;DR

This work introduces LAP, a latent diffusion planner for autonomous driving that performs planning in a VAE-learned latent space to separate high-level driving intents from low-level kinematics. A conditional latent diffusion model generates multi-modal plans, and a fine-grained feature distillation bridges semantic planning with vectorized scene context. LAP achieves state-of-the-art closed-loop performance on the nuPlan benchmark with up to a 10x inference speedup and demonstrates robust multi-modal behavior with few denoising steps. The approach reduces reliance on post-processing and provides insights into the benefits of latent-space planning for efficient, diverse autonomous driving policies.

Abstract

Diffusion models have demonstrated strong capabilities for modeling human-like driving behaviors in autonomous driving, but their iterative sampling process induces substantial latency, and operating directly on raw trajectory points forces the model to spend capacity on low-level kinematics, rather than high-level multi-modal semantics. To address these limitations, we propose LAtent Planner (LAP), a framework that plans in a VAE-learned latent space that disentangles high-level intents from low-level kinematics, enabling our planner to capture rich, multi-modal driving strategies. We further introduce a fine-grained feature distillation mechanism to guide a better interaction and fusion between the high-level semantic planning space and the vectorized scene context. Notably, LAP can produce high-quality plans in one single denoising step, substantially reducing computational overhead. Through extensive evaluations on the large-scale nuPlan benchmark, LAP achieves state-of-the-art closed-loop performance among learning-based planning methods, while demonstrating an inference speed-up of at most 10 times over previous SOTA approaches. Code will be released at: https://github.com/jhz1192/Latent-Planner.

LAP: Fast LAtent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving

TL;DR

This work introduces LAP, a latent diffusion planner for autonomous driving that performs planning in a VAE-learned latent space to separate high-level driving intents from low-level kinematics. A conditional latent diffusion model generates multi-modal plans, and a fine-grained feature distillation bridges semantic planning with vectorized scene context. LAP achieves state-of-the-art closed-loop performance on the nuPlan benchmark with up to a 10x inference speedup and demonstrates robust multi-modal behavior with few denoising steps. The approach reduces reliance on post-processing and provides insights into the benefits of latent-space planning for efficient, diverse autonomous driving policies.

Abstract

Diffusion models have demonstrated strong capabilities for modeling human-like driving behaviors in autonomous driving, but their iterative sampling process induces substantial latency, and operating directly on raw trajectory points forces the model to spend capacity on low-level kinematics, rather than high-level multi-modal semantics. To address these limitations, we propose LAtent Planner (LAP), a framework that plans in a VAE-learned latent space that disentangles high-level intents from low-level kinematics, enabling our planner to capture rich, multi-modal driving strategies. We further introduce a fine-grained feature distillation mechanism to guide a better interaction and fusion between the high-level semantic planning space and the vectorized scene context. Notably, LAP can produce high-quality plans in one single denoising step, substantially reducing computational overhead. Through extensive evaluations on the large-scale nuPlan benchmark, LAP achieves state-of-the-art closed-loop performance among learning-based planning methods, while demonstrating an inference speed-up of at most 10 times over previous SOTA approaches. Code will be released at: https://github.com/jhz1192/Latent-Planner.

Paper Structure

This paper contains 34 sections, 31 equations, 13 figures, 13 tables.

Figures (13)

  • Figure 1: Overall architecture of Latent Planner
  • Figure 2: Initial State Injection.
  • Figure 3: Comparison of trajectory proposals for a right turn scenario. This figure illustrates the behavior of LAP (left) and Diffusion Planner (right) , which samples $K_{\text{mode}}=3$ candidate trajectories (thin colored lines) at each planning cycle. Notably, the proposals from our LAP planner exhibit significant multi-modality, covering a diverse range of turning radii and speeds while proceeding along the navigation route.
  • Figure 4: Impact of designed modules.
  • Figure 4: Closed-loop planning results: To showcase our planner, we have chosen 4 scenarios that involve turning, lane changing, and interactions with Vulnerable Road Users (VRUs). Each row represents a scenario at 0, 5, 10, and 15 seconds intervals. Each frame includes the future planning of the ego vehicle and the ground truth ego trajectory.
  • ...and 8 more figures