SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control
Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang
TL;DR
The paper tackles the need for scalable, richly diverse synthetic data for autonomous driving and shows that prior generative pipelines suffer from limited foreground diversity. It introduces SubjectDrive, a latent-diffusion video generator with subject control comprising SPA, SVA, and ATA to inject diverse external subjects and maintain temporal coherence. Through extensive experiments on nuScenes, it demonstrates that synthetic data scaling improves detection and tracking and that external subject banks significantly boost gains, even surpassing models pre-trained on nuImages. The work underscores the potential of controllable generative data to revolutionize data production for AV systems.
Abstract
Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact of scaling up the quantity of generative data on the performance of downstream perception models and find that enhancing data diversity plays a crucial role in effectively scaling generative data production. Therefore, we have developed a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data. Extensive evaluations confirm SubjectDrive's efficacy in generating scalable autonomous driving training data, marking a significant step toward revolutionizing data production methods in this field.
