Table of Contents
Fetching ...

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

Xuemeng Yang, Licheng Wen, Yukai Ma, Jianbiao Mei, Xin Li, Tiantian Wei, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

TL;DR

DriveArena introduces a modular, high-fidelity closed-loop driving simulation platform that couples a traffic engine capable of global road-network traffic generation with a diffusion-based World Dreamer that renders realistic surround-view images. By closing the perception-action loop through image-based driving agents, it enables iterative, diverse scenario exploration and robust evaluation of vision-based autonomous driving systems. The work demonstrates higher fidelity and controllability than prior generators and supports open- and closed-loop experiments with a representative agent (UniAD), highlighting both the promise and current limitations of closed-loop simulation for driving research. The platform is designed for extensibility and aims to bridge sim-to-real gaps, offering a practical venue for evaluating and evolving driving agents and generative scene models.

Abstract

This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core components: Traffic Manager, a traffic simulator capable of generating realistic traffic flow on any worldwide street map, and World Dreamer, a high-fidelity conditional generative model with infinite autoregression. This powerful synergy empowers any driving agent capable of processing real-world images to navigate in DriveArena's simulated environment. The agent perceives its surroundings through images generated by World Dreamer and output trajectories. These trajectories are fed into Traffic Manager, achieving realistic interactions with other vehicles and producing a new scene layout. Finally, the latest scene layout is relayed back into World Dreamer, perpetuating the simulation cycle. This iterative process fosters closed-loop exploration within a highly realistic environment, providing a valuable platform for developing and evaluating driving agents across diverse and challenging scenarios. DriveArena signifies a substantial leap forward in leveraging generative image data for the driving simulation platform, opening insights for closed-loop autonomous driving. Code will be available soon on GitHub: https://github.com/PJLab-ADG/DriveArena

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

TL;DR

DriveArena introduces a modular, high-fidelity closed-loop driving simulation platform that couples a traffic engine capable of global road-network traffic generation with a diffusion-based World Dreamer that renders realistic surround-view images. By closing the perception-action loop through image-based driving agents, it enables iterative, diverse scenario exploration and robust evaluation of vision-based autonomous driving systems. The work demonstrates higher fidelity and controllability than prior generators and supports open- and closed-loop experiments with a representative agent (UniAD), highlighting both the promise and current limitations of closed-loop simulation for driving research. The platform is designed for extensibility and aims to bridge sim-to-real gaps, offering a practical venue for evaluating and evolving driving agents and generative scene models.

Abstract

This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core components: Traffic Manager, a traffic simulator capable of generating realistic traffic flow on any worldwide street map, and World Dreamer, a high-fidelity conditional generative model with infinite autoregression. This powerful synergy empowers any driving agent capable of processing real-world images to navigate in DriveArena's simulated environment. The agent perceives its surroundings through images generated by World Dreamer and output trajectories. These trajectories are fed into Traffic Manager, achieving realistic interactions with other vehicles and producing a new scene layout. Finally, the latest scene layout is relayed back into World Dreamer, perpetuating the simulation cycle. This iterative process fosters closed-loop exploration within a highly realistic environment, providing a valuable platform for developing and evaluating driving agents across diverse and challenging scenarios. DriveArena signifies a substantial leap forward in leveraging generative image data for the driving simulation platform, opening insights for closed-loop autonomous driving. Code will be available soon on GitHub: https://github.com/PJLab-ADG/DriveArena
Paper Structure (20 sections, 3 equations, 9 figures, 4 tables)

This paper contains 20 sections, 3 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Comparison of DriveArena with existing autonomous driving methods and platforms along the dimensions of Interactivity and Fidelity. Interactivity indicates the platform's control over vehicles, ranging from open-loop, uncontrollable closed-loop, to controllable closed-loop. Fidelity reflects the realism of driving scenarios, categorized from bottom to top as: traffic-flow only, unrealistic scenes, realistic scenes, and diverse scenes. DriveArena uniquely occupies the top-right, being the first simulation platform to generate diverse traffic scenarios and surround-view images with closed-loop controllability for all vehicles. For detailed descriptions of these methods, please refer to Table \ref{['tab:comparison']}.
  • Figure 2: Overview of the DriveArena framework. The system consists of two main components: (1) The , which processes Internet-downloaded HD maps to create diverse urban layouts, manages vehicle movements including background traffic, and handles collision detection. (2) The World Dreamer, an auto-regressive generative model that generates photo-realistic, multi-view camera images corresponding to the simulation state, with controllable parameters following given prompts. The framework operates in a closed loop: generated images are fed to the AD agent, which outputs the planned ego trajectory. The trajectory is then fed back into the for the next simulation step.
  • Figure 3: The figure illustrates the denoising process employed by World Dreamer. Beginning with randomly sampled noise, the autoregressive model utilizes various conditions—such as multi-view layout, BEV map, text prompt, reference image, relative pose, camera parameters, and 3D bounding boxes—to enhance the denoising procedure. The encoders depicted in the figure are distinct, with the color indicating whether each one utilizes a pre-trained network and is frozen. Additionally, we incorporate ControlNet to introduce conditional control into the diffusion model.
  • Figure 4: Demonstration of reference image influence on generated scenes. Three scenes are presented, all derived from a single nuScenes reference frame. Despite notable variations in road networks, World Dreamer successfully integrates street styles and weather conditions from the reference image while adhering to specified control conditions for vehicles and road layouts. Of particular interest is the aerial corridor visible in the reference image, which is accurately reproduced in scenes #1 and #2. However, in scene #3, due to the curved road configuration, the corridor is not generated, illustrating World Dreamer's adaptability to different road geometries.
  • Figure 5: Demonstration of diverse prompts and reference images' influence on identical scenes. The figure presents four distinct image sequences generated by DriveArena for a same 30-second simulation sequence, each utilizing different prompts and reference images. All sequences strictly adhere to the provided control conditions for road structures and vehicles, maintaining cross-view consistency. Notably, the four sequences exhibit significant variations in weather and lighting conditions, while consistently preserving their respective styles throughout the entire 30-second duration. https://pjlab-adg.github.io/DriveArena/
  • ...and 4 more figures