Decoupled Diffusion Sparks Adaptive Scene Generation

Yunsong Zhou; Naisheng Ye; William Ljungbergh; Tianyu Li; Jiazhi Yang; Zetong Yang; Hongzi Zhu; Christoffer Petersson; Hongyang Li

Decoupled Diffusion Sparks Adaptive Scene Generation

Yunsong Zhou, Naisheng Ye, William Ljungbergh, Tianyu Li, Jiazhi Yang, Zetong Yang, Hongzi Zhu, Christoffer Petersson, Hongyang Li

TL;DR

Nexus tackles the challenge of controllable and reactive driving-scene generation for autonomous systems by decoupling diffusion into goal-oriented and reactive pathways using independent noise states. It introduces noise-masking training to fuse low-noise goal cues with high-noise scene evolution and noise-aware scheduling to update scene tokens in real time. The authors also create Nexus-Data, a large corpus of safety-critical corner cases generated in simulation to improve generalization to rare scenarios. Empirically, Nexus achieves a 40% reduction in displacement error and, with data augmentation, a 20% improvement in closed-loop planning, outperforming prior diffusion-based world-generation approaches.

Abstract

Controllable scene generation could reduce the cost of diverse data collection substantially for autonomous driving. Prior works formulate the traffic layout generation as predictive progress, either by denoising entire sequences at once or by iteratively predicting the next frame. However, full sequence denoising hinders online reaction, while the latter's short-sighted next-frame prediction lacks precise goal-state guidance. Further, the learned model struggles to generate complex or challenging scenarios due to a large number of safe and ordinal driving behaviors from open datasets. To overcome these, we introduce Nexus, a decoupled scene generation framework that improves reactivity and goal conditioning by simulating both ordinal and challenging scenarios from fine-grained tokens with independent noise states. At the core of the decoupled pipeline is the integration of a partial noise-masking training strategy and a noise-aware schedule that ensures timely environmental updates throughout the denoising process. To complement challenging scenario generation, we collect a dataset consisting of complex corner cases. It covers 540 hours of simulated data, including high-risk interactions such as cut-in, sudden braking, and collision. Nexus achieves superior generation realism while preserving reactivity and goal orientation, with a 40% reduction in displacement error. We further demonstrate that Nexus improves closed-loop planning by 20% through data augmentation and showcase its capability in safety-critical data generation.

Decoupled Diffusion Sparks Adaptive Scene Generation

TL;DR

Abstract

Decoupled Diffusion Sparks Adaptive Scene Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)