Table of Contents
Fetching ...

HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving

Qiang Li, Yingwenqi Jiang, Tuoxi Li, Duyu Chen, Xiang Feng, Yucheng Ao, Shangyue Liu, Xingchen Yu, Youcheng Cai, Yumeng Liu, Yuexin Ma, Xin Hu, Li Liu, Yu Zhang, Linkun Xu, Bingtao Gao, Xueyuan Wang, Shuchang Zhou, Xianming Liu, Ligang Liu

TL;DR

HybridWorldSim introduces a scalable, camera-driven closed-loop simulator by unifying a static-scene reconstruction built on a hybrid 3D Gaussian representation with a diffusion-based dynamic scene generator. The static stage uses sky/ground Code-Gaussians and anchor-based Background Nodes to capture appearance and geometry across multiple traversals, while the dynamic stage enforces geometric and photometric consistency through consistency conditions and conditioned diffusion. The MIRROR dataset accompanies the framework to provide multi-traversal, environment-diverse data for robust benchmarking. Empirical results show state-of-the-art performance on static reconstruction, robust dynamic editing, and coherent closed-loop simulations, highlighting the system’s scalability and practical impact for end-to-end autonomous driving research.

Abstract

Realistic and controllable simulation is critical for advancing end-to-end autonomous driving, yet existing approaches often struggle to support novel view synthesis under large viewpoint changes or to ensure geometric consistency. We introduce HybridWorldSim, a hybrid simulation framework that integrates multi-traversal neural reconstruction for static backgrounds with generative modeling for dynamic agents. This unified design addresses key limitations of previous methods, enabling the creation of diverse and high-fidelity driving scenarios with reliable visual and spatial consistency. To facilitate robust benchmarking, we further release a new multi-traversal dataset MIRROR that captures a wide range of routes and environmental conditions across different cities. Extensive experiments demonstrate that HybridWorldSim surpasses previous state-of-the-art methods, providing a practical and scalable solution for high-fidelity simulation and a valuable resource for research and development in autonomous driving.

HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving

TL;DR

HybridWorldSim introduces a scalable, camera-driven closed-loop simulator by unifying a static-scene reconstruction built on a hybrid 3D Gaussian representation with a diffusion-based dynamic scene generator. The static stage uses sky/ground Code-Gaussians and anchor-based Background Nodes to capture appearance and geometry across multiple traversals, while the dynamic stage enforces geometric and photometric consistency through consistency conditions and conditioned diffusion. The MIRROR dataset accompanies the framework to provide multi-traversal, environment-diverse data for robust benchmarking. Empirical results show state-of-the-art performance on static reconstruction, robust dynamic editing, and coherent closed-loop simulations, highlighting the system’s scalability and practical impact for end-to-end autonomous driving research.

Abstract

Realistic and controllable simulation is critical for advancing end-to-end autonomous driving, yet existing approaches often struggle to support novel view synthesis under large viewpoint changes or to ensure geometric consistency. We introduce HybridWorldSim, a hybrid simulation framework that integrates multi-traversal neural reconstruction for static backgrounds with generative modeling for dynamic agents. This unified design addresses key limitations of previous methods, enabling the creation of diverse and high-fidelity driving scenarios with reliable visual and spatial consistency. To facilitate robust benchmarking, we further release a new multi-traversal dataset MIRROR that captures a wide range of routes and environmental conditions across different cities. Extensive experiments demonstrate that HybridWorldSim surpasses previous state-of-the-art methods, providing a practical and scalable solution for high-fidelity simulation and a valuable resource for research and development in autonomous driving.

Paper Structure

This paper contains 28 sections, 12 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: We introduce HybridWorldSim, a scalable simulator that couples multi-trajectory neural reconstruction for static backgrounds with generative modeling for dynamic agents. It enables: (1) Flexible scene extension allows newly collected data as reference videos, enabling new dynamic objects to be incorporated into the simulator with high-fidelity from any viewpoint. This extension requires no retraining, ensuring high scalability and efficiency for diverse simulation scenarios. (2) environment and lighting control via latent-space manipulation in Hybrid Gaussians. (3) dynamic-agent editing, which supports a wide range of vehicle types and behaviors.
  • Figure 2: We present our multi-traversal driving dataset MIRROR, collected using various mass-production vehicles, each equipped with a standardized seven-camera rig providing 360-degree coverage. MIRROR dataset captures realistic driving patterns through naturalistic driving behaviors, demonstrates rich multi-traversal diversity with repeated passes through identical regions, and encompasses diverse environmental conditions including varying weather and illumination.
  • Figure 3: Our framework consists of two main stages: static scene reconstruction and dynamic scene generation. The static stage uses a hybrid 3D Gaussian representation to reconstruct scenes from multiple trajectories, with a multi-node design and trajectory embeddings to decouple scene components and environmental conditions. Given a source view image and a target view, the dynamic stage combines the reconstructed static scene with diffusion-based vehicle generation to synthesize view-consistent dynamic agents.
  • Figure 4: We compare our static scene reconstruction module with OmniRe chen2025omnire (single-traversal) and MTGS li2025mtgs (multi-traversal) on nuScenes caesar2020nuscenes and MIRROR dataset.
  • Figure 5: We compare with DriveEditor liang2024driveeditor on vehicle translation tasks, where the reference vehicle is offset horizontally. We visualize the projected bounding boxes at the target positions.
  • ...and 6 more figures