Table of Contents
Fetching ...

DragTraffic: Interactive and Controllable Traffic Scene Generation for Autonomous Driving

Sheng Wang, Ge Sun, Fulong Ma, Tianshuai Hu, Qiang Qin, Yongkang Song, Lei Zhu, Junwei Liang

TL;DR

DragTraffic, a generalized, interactive, and controllable traffic scene generation framework based on conditional diffusion, enables non-experts to generate a variety of realistic driving scenarios for different types of traffic agents through an adaptive mixture expert architecture.

Abstract

Evaluating and training autonomous driving systems require diverse and scalable corner cases. However, most existing scene generation methods lack controllability, accuracy, and versatility, resulting in unsatisfactory generation results. Inspired by DragGAN in image generation, we propose DragTraffic, a generalized, interactive, and controllable traffic scene generation framework based on conditional diffusion. DragTraffic enables non-experts to generate a variety of realistic driving scenarios for different types of traffic agents through an adaptive mixture expert architecture. We employ a regression model to provide a general initial solution and a refinement process based on the conditional diffusion model to ensure diversity. User-customized context is introduced through cross-attention to ensure high controllability. Experiments on a real-world driving dataset show that DragTraffic outperforms existing methods in terms of authenticity, diversity, and freedom. Demo videos and code are available at https://chantsss.github.io/Dragtraffic/.

DragTraffic: Interactive and Controllable Traffic Scene Generation for Autonomous Driving

TL;DR

DragTraffic, a generalized, interactive, and controllable traffic scene generation framework based on conditional diffusion, enables non-experts to generate a variety of realistic driving scenarios for different types of traffic agents through an adaptive mixture expert architecture.

Abstract

Evaluating and training autonomous driving systems require diverse and scalable corner cases. However, most existing scene generation methods lack controllability, accuracy, and versatility, resulting in unsatisfactory generation results. Inspired by DragGAN in image generation, we propose DragTraffic, a generalized, interactive, and controllable traffic scene generation framework based on conditional diffusion. DragTraffic enables non-experts to generate a variety of realistic driving scenarios for different types of traffic agents through an adaptive mixture expert architecture. We employ a regression model to provide a general initial solution and a refinement process based on the conditional diffusion model to ensure diversity. User-customized context is introduced through cross-attention to ensure high controllability. Experiments on a real-world driving dataset show that DragTraffic outperforms existing methods in terms of authenticity, diversity, and freedom. Demo videos and code are available at https://chantsss.github.io/Dragtraffic/.
Paper Structure (20 sections, 8 equations, 3 figures, 2 tables)

This paper contains 20 sections, 8 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The dataset sample space. The left image illustrates the distribution of the collected data, while the right image shows the expanded sample space achieved through data augmentation. In this context, $x$ represents the different dimensions that constitute the dataset, and $\bm{v}$ represents the specific samples collected.
  • Figure 2: The generation pipeline. The Condition Context Query gathers personalized information from the user, either through an interactive UI or by retrieving it from the dataset. The Mixture of Experts Gate selects the appropriate model for inference based on the agent type. Input data is presented as agent-centric vectors. After obtaining the initial solution, it is further refined through diffusion to ultimately generate the scene.
  • Figure 3: The demonstration of creating, editing and correction. Colored boxes represent agents, with different sizes for each type. A motorcycle is depicted as a long bar, while a pedestrian is represented as a square. A series of colored dots in front of each agent indicates the generated trajectory, and the shadow represents past actions.