Table of Contents
Fetching ...

DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation

Tianyi Yan, Dongming Wu, Wencheng Han, Junpeng Jiang, Xia Zhou, Kun Zhan, Cheng-zhong Xu, Jianbing Shen

TL;DR

DrivingSphere tackles the gap between open-loop, high-fidelity synthetic data and real-world, closed-loop autonomous driving evaluation by building a $4D$ occupancy-based driving world and rendering high-fidelity multi-view videos. It combines OccDreamer for static background generation, an actor bank for dynamic participants, and VideoDreamer with a dual-path encoding and ID-aware actor representations to maintain spatial-temporal coherence. The framework introduces an agent coordination loop enabling Ego and Environment Agents to interact in a continuous feedback cycle, and demonstrates superior visual fidelity, temporal consistency, and driving-performance metrics in open- and closed-loop tests on nuScenes. This approach reduces the simulation-to-real-world domain gap, providing a practical platform for validating and improving vision-based autonomous driving systems.

Abstract

Autonomous driving evaluation requires simulation environments that closely replicate actual road conditions, including real-world sensory data and responsive feedback loops. However, many existing simulations need to predict waypoints along fixed routes on public datasets or synthetic photorealistic data, \ie, open-loop simulation usually lacks the ability to assess dynamic decision-making. While the recent efforts of closed-loop simulation offer feedback-driven environments, they cannot process visual sensor inputs or produce outputs that differ from real-world data. To address these challenges, we propose DrivingSphere, a realistic and closed-loop simulation framework. Its core idea is to build 4D world representation and generate real-life and controllable driving scenarios. In specific, our framework includes a Dynamic Environment Composition module that constructs a detailed 4D driving world with a format of occupancy equipping with static backgrounds and dynamic objects, and a Visual Scene Synthesis module that transforms this data into high-fidelity, multi-view video outputs, ensuring spatial and temporal consistency. By providing a dynamic and realistic simulation environment, DrivingSphere enables comprehensive testing and validation of autonomous driving algorithms, ultimately advancing the development of more reliable autonomous cars. The benchmark will be publicly released.

DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation

TL;DR

DrivingSphere tackles the gap between open-loop, high-fidelity synthetic data and real-world, closed-loop autonomous driving evaluation by building a occupancy-based driving world and rendering high-fidelity multi-view videos. It combines OccDreamer for static background generation, an actor bank for dynamic participants, and VideoDreamer with a dual-path encoding and ID-aware actor representations to maintain spatial-temporal coherence. The framework introduces an agent coordination loop enabling Ego and Environment Agents to interact in a continuous feedback cycle, and demonstrates superior visual fidelity, temporal consistency, and driving-performance metrics in open- and closed-loop tests on nuScenes. This approach reduces the simulation-to-real-world domain gap, providing a practical platform for validating and improving vision-based autonomous driving systems.

Abstract

Autonomous driving evaluation requires simulation environments that closely replicate actual road conditions, including real-world sensory data and responsive feedback loops. However, many existing simulations need to predict waypoints along fixed routes on public datasets or synthetic photorealistic data, \ie, open-loop simulation usually lacks the ability to assess dynamic decision-making. While the recent efforts of closed-loop simulation offer feedback-driven environments, they cannot process visual sensor inputs or produce outputs that differ from real-world data. To address these challenges, we propose DrivingSphere, a realistic and closed-loop simulation framework. Its core idea is to build 4D world representation and generate real-life and controllable driving scenarios. In specific, our framework includes a Dynamic Environment Composition module that constructs a detailed 4D driving world with a format of occupancy equipping with static backgrounds and dynamic objects, and a Visual Scene Synthesis module that transforms this data into high-fidelity, multi-view video outputs, ensuring spatial and temporal consistency. By providing a dynamic and realistic simulation environment, DrivingSphere enables comprehensive testing and validation of autonomous driving algorithms, ultimately advancing the development of more reliable autonomous cars. The benchmark will be publicly released.

Paper Structure

This paper contains 12 sections, 14 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of frameworks for evaluating end-to-end Autonomous Driving (AD) algorithms. (a) Open-loop evaluation uses waypoint predictions along fixed routes in pre-collected datasets Caesar_2020_CVPR_nuscenes. Generative models gao2023magicdrivewang2023drivedreamerzhao2024drivedreamer2 create diverse, realistic data but lack dynamic feedback to assess AD responses to dynamic changes. (b) Simulated closed-loop evaluationdosovitskiy2017carlali2021metadrive offers feedback-driven, scalable environments where agent actions impact simulation dynamics; however, sensory outputs often differ from real-world data, limiting effectiveness for algorithms trained on real data. (c) Our generative closed-loop simulation framework, DrivingSphere, addresses these limitations by delivering realistic visual inputs and continuous, responsive feedback between the AD agent and environment.
  • Figure 2: Overview of DrivingSphere framework. (a) The Dynamic Environment Composition module builds a 4D driving world, simulating real driving scenarios with backgrounds generated by OccDreamer, dynamic actors from an actor bank, and trajectories guided by a transportation simulator. (b) The Visual Scene Synthesis produces high-fidelity, photo-realistic video frames conditioned on global semantics, view-specific details, and scene prompts, supporting close-loop evaluation. Control signals enable adaptive feedback for driving agents, facilitating continuous testing and evaluation of driving algorithms in the simulated environment.
  • Figure 3: The framework of OccDreamer, which includes Occupancy Tokenizer, Region Occupancy Generation, and Scene Extension to generate a city-level background for 4D driving world.
  • Figure 4: Overview of the VideoDreamer. The model conditions on 4D driving world and enriched actor embeddings (e.g., actor ID, position and caption). This ensures high visual fidelity and geometric consistency in the generated driving simulations.
  • Figure 5: Qualitative results of generated 3D scene. OccDreamer uses text prompt and bev map as guidance to generate 3D scenes with controllable regional content and road structure.
  • ...and 1 more figures