MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, Qiang Xu
TL;DR
MagicDrive3D proposes a two-stage framework that first learns a multi-view video generator conditioned on road maps, 3D boxes, and text, producing consistent views, and then reconstructs a 3D Gaussian Splatting scene (3DGS) with Fault-Tolerant Splatting and depth/exposure priors. The approach enables any-view rendering of controllable street scenes and reduces data collection needs by leveraging standard autonomous driving datasets like nuScenes. It demonstrates improvements in novel-view fidelity, video quality, and perception-task data augmentation (e.g., BEV segmentation), while providing applications for object-level dynamics and scene editing. The work provides a practical path toward realistic, controllable 3D street simulations with broad potential for autonomous driving and beyond.
Abstract
Controllable generative models for images and videos have seen significant success, yet 3D scene generation, especially in unbounded scenarios like autonomous driving, remains underdeveloped. Existing methods lack flexible controllability and often rely on dense view data collection in controlled environments, limiting their generalizability across common datasets (e.g., nuScenes). In this paper, we introduce MagicDrive3D, a novel framework for controllable 3D street scene generation that combines video-based view synthesis with 3D representation (3DGS) generation. It supports multi-condition control, including road maps, 3D objects, and text descriptions. Unlike previous approaches that require 3D representation before training, MagicDrive3D first trains a multi-view video generation model to synthesize diverse street views. This method utilizes routinely collected autonomous driving data, reducing data acquisition challenges and enriching 3D scene generation. In the 3DGS generation step, we introduce Fault-Tolerant Gaussian Splatting to address minor errors and use monocular depth for better initialization, alongside appearance modeling to manage exposure discrepancies across viewpoints. Experiments show that MagicDrive3D generates diverse, high-quality 3D driving scenes, supports any-view rendering, and enhances downstream tasks like BEV segmentation, demonstrating its potential for autonomous driving simulation and beyond.
