Table of Contents
Fetching ...

Temporally Decoupled Diffusion Planning for Autonomous Driving

Xiang Li, Bikun Wang, John Zhang, Jianjun Wang

Abstract

Motion planning in dynamic urban environments requires balancing immediate safety with long-term goals. While diffusion models effectively capture multi-modal decision-making, existing approaches treat trajectories as monolithic entities, overlooking heterogeneous temporal dependencies where near-term plans are constrained by instantaneous dynamics and far-term plans by navigational goals. To address this, we propose Temporally Decoupled Diffusion Model (TDDM), which reformulates trajectory generation via a noise-as-mask paradigm. By partitioning trajectories into segments with independent noise levels, we implicitly treat high noise as information voids and weak noise as contextual cues. This compels the model to reconstruct corrupted near-term states by leveraging internal correlations with better-preserved temporal contexts. Architecturally, we introduce a Temporally Decoupled Adaptive Layer Normalization (TD-AdaLN) to inject segment-specific timesteps. During inference, our Asymmetric Temporal Classifier-Free Guidance utilizes weakly noised far-term priors to guide immediate path generation. Evaluations on the nuPlan benchmark show TDDM approaches or exceeds state-of-the-art baselines, particularly excelling in the challenging Test14-hard subset.

Temporally Decoupled Diffusion Planning for Autonomous Driving

Abstract

Motion planning in dynamic urban environments requires balancing immediate safety with long-term goals. While diffusion models effectively capture multi-modal decision-making, existing approaches treat trajectories as monolithic entities, overlooking heterogeneous temporal dependencies where near-term plans are constrained by instantaneous dynamics and far-term plans by navigational goals. To address this, we propose Temporally Decoupled Diffusion Model (TDDM), which reformulates trajectory generation via a noise-as-mask paradigm. By partitioning trajectories into segments with independent noise levels, we implicitly treat high noise as information voids and weak noise as contextual cues. This compels the model to reconstruct corrupted near-term states by leveraging internal correlations with better-preserved temporal contexts. Architecturally, we introduce a Temporally Decoupled Adaptive Layer Normalization (TD-AdaLN) to inject segment-specific timesteps. During inference, our Asymmetric Temporal Classifier-Free Guidance utilizes weakly noised far-term priors to guide immediate path generation. Evaluations on the nuPlan benchmark show TDDM approaches or exceeds state-of-the-art baselines, particularly excelling in the challenging Test14-hard subset.

Paper Structure

This paper contains 30 sections, 8 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: The comparison of our temporal decoupled diffusion model and full sequence diffusion model. (a) full sequence diffusion model. (b) temporal decoupled diffusion model (Ours). It can be seen that the biggest difference between our model and the full sequence model is the support for independent diffusion processes on temporal segments, where $t$ denotes the diffusion timestep.
  • Figure 2: Overview of the temporally decoupled diffusion model. We adopt the diffusion transformer (DiT) architecture as the decoder, with the encoder responsible for encoding environmental context information (including static obstacles, agents, and roads). The decoder can accept independent timesteps from temporal segments through temporally decoupled adaLN.
  • Figure 3: Pipeline of the Asymmetric Temporal CFG. The trajectory is a fusion of outputs from an unconditional path and a conditional path. The unconditional path performs standard full-sequence diffusion. The conditional path, enabled by an asymmetric temporal mask, leverages a nearly clean future prior to guide the denoising generation of the past segment.
  • Figure 4: Comparison of Planned Trajectories by Diffusion Planner and our proposed TDDM method across two challenging traffic scenarios. The ego vehicle trajectory is shown in purple.