Table of Contents
Fetching ...

SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang

TL;DR

The paper tackles the need for scalable, richly diverse synthetic data for autonomous driving and shows that prior generative pipelines suffer from limited foreground diversity. It introduces SubjectDrive, a latent-diffusion video generator with subject control comprising SPA, SVA, and ATA to inject diverse external subjects and maintain temporal coherence. Through extensive experiments on nuScenes, it demonstrates that synthetic data scaling improves detection and tracking and that external subject banks significantly boost gains, even surpassing models pre-trained on nuImages. The work underscores the potential of controllable generative data to revolutionize data production for AV systems.

Abstract

Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact of scaling up the quantity of generative data on the performance of downstream perception models and find that enhancing data diversity plays a crucial role in effectively scaling generative data production. Therefore, we have developed a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data. Extensive evaluations confirm SubjectDrive's efficacy in generating scalable autonomous driving training data, marking a significant step toward revolutionizing data production methods in this field.

SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

TL;DR

The paper tackles the need for scalable, richly diverse synthetic data for autonomous driving and shows that prior generative pipelines suffer from limited foreground diversity. It introduces SubjectDrive, a latent-diffusion video generator with subject control comprising SPA, SVA, and ATA to inject diverse external subjects and maintain temporal coherence. Through extensive experiments on nuScenes, it demonstrates that synthetic data scaling improves detection and tracking and that external subject banks significantly boost gains, even surpassing models pre-trained on nuImages. The work underscores the potential of controllable generative data to revolutionize data production for AV systems.

Abstract

Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact of scaling up the quantity of generative data on the performance of downstream perception models and find that enhancing data diversity plays a crucial role in effectively scaling generative data production. Therefore, we have developed a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data. Extensive evaluations confirm SubjectDrive's efficacy in generating scalable autonomous driving training data, marking a significant step toward revolutionizing data production methods in this field.
Paper Structure (19 sections, 7 equations, 9 figures, 5 tables)

This paper contains 19 sections, 7 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: The comparison between existing method and SubjectDrive framework.(a) Existing data generation framework that uses the control sequence and sampling noise to generate synthetic data with limited sample diversity and scalability. (b) Compared with the traditional framework, SubjectDrive introduces additional synthesis diversity by incorporating extra subject control to enhance the scalability of generative model. (c) Evaluation of data scaling on the nuScenes detection and tracking task.
  • Figure 2: Above: Existing autonomous driving generative models struggle to produce diverse foreground samples. Below: By enhancing the sampling diversity capabilities with our subject control methods, the diversity of the generated foreground sample has significantly improved.
  • Figure 3: Overview of SubjectDrive. The pipeline of SubjectDrive involves a frozen auto-encoder and a trainable UNet-based diffusion model. Different control signal sources including extended text prompt, condition layout, and subject bank.
  • Figure 4: The Subject Prompt Adaptor which augments the original text embedding of extended prompt with corresponding ID identifier and visual semantic information to enhance the expressity of subjects.
  • Figure 5: The Subject Visual Adapter which injects location-enhanced subject information into visual features cooperated with gated self-attention.
  • ...and 4 more figures