Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

Xuanchi Ren; Yifan Lu; Tianshi Cao; Ruiyuan Gao; Shengyu Huang; Amirmojtaba Sabour; Tianchang Shen; Tobias Pfaff; Jay Zhangjie Wu; Runjian Chen; Seung Wook Kim; Jun Gao; Laura Leal-Taixe; Mike Chen; Sanja Fidler; Huan Ling

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, Huan Ling

TL;DR

Cosmos-Drive-Dreams tackles the data bottleneck in autonomous driving by leveraging post-trained world foundation models to generate controllable, multi-view driving videos and LiDAR data. The approach combines precise layout-conditioned video generation, single-view-to-multi-view expansion, in-the-wild annotation, and weather-aware LiDAR synthesis, augmented by an LLM-driven prompt rewriter and a VLM-based rejection filter. Empirical results show consistent improvements across 3D lane detection, 3D object detection, LiDAR-based detection, and policy learning, especially in long-tail and corner-case scenarios. The work provides open-source models, datasets, and toolkits to enable scalable synthetic data generation and rapid experimentation. Overall, Cosmos-Drive-Dreams offers a practical pathway to scale synthetic data for safer, more robust autonomous driving systems.

Abstract

Collecting and annotating real-world data for safety-critical physical AI systems, such as Autonomous Vehicle (AV), is time-consuming and costly. It is especially challenging to capture rare edge cases, which play a critical role in training and testing of an AV system. To address this challenge, we introduce the Cosmos-Drive-Dreams - a synthetic data generation (SDG) pipeline that aims to generate challenging scenarios to facilitate downstream tasks such as perception and driving policy training. Powering this pipeline is Cosmos-Drive, a suite of models specialized from NVIDIA Cosmos world foundation model for the driving domain and are capable of controllable, high-fidelity, multi-view, and spatiotemporally consistent driving video generation. We showcase the utility of these models by applying Cosmos-Drive-Dreams to scale the quantity and diversity of driving datasets with high-fidelity and challenging scenarios. Experimentally, we demonstrate that our generated data helps in mitigating long-tail distribution problems and enhances generalization in downstream tasks such as 3D lane detection, 3D object detection and driving policy learning. We open source our pipeline toolkit, dataset and model weights through the NVIDIA's Cosmos platform. Project page: https://research.nvidia.com/labs/toronto-ai/cosmos_drive_dreams

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

TL;DR

Abstract

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (22)