RealEngine: Simulating Autonomous Driving in Realistic Context
Junzhe Jiang, Nan Song, Jingyu Li, Xiatian Zhu, Li Zhang
TL;DR
RealEngine presents a driving simulation framework that unifies background scene reconstruction and foreground traffic-participant modeling to deliver photorealistic, multi-modal sensor rendering in a closed-loop setting. It enables flexible scene composition, multi-agent interaction, and safety-critical evaluations across non-reactive, safety-test, and multi-agent scenarios. The approach leverages StreetGaussians and GS-LiDAR for efficient background reconstruction, 3D meshes for foreground agents, diffusion-guided lighting, and differentiable relighting to bridge the gap between realism and controllability. Through comprehensive experiments on Navsim/nuPlan data, RealEngine demonstrates improved reconstruction fidelity, stable closed-loop trajectories, and meaningful PDMS-based assessments, offering a practical benchmark for real-world driving performance. This work has significant implications for robust evaluation and development of autonomous driving systems in realistic, diverse, and interactive contexts.
Abstract
Driving simulation plays a crucial role in developing reliable driving agents by providing controlled, evaluative environments. To enable meaningful assessments, a high-quality driving simulator must satisfy several key requirements: multi-modal sensing capabilities (e.g., camera and LiDAR) with realistic scene rendering to minimize observational discrepancies; closed-loop evaluation to support free-form trajectory behaviors; highly diverse traffic scenarios for thorough evaluation; multi-agent cooperation to capture interaction dynamics; and high computational efficiency to ensure affordability and scalability. However, existing simulators and benchmarks fail to comprehensively meet these fundamental criteria. To bridge this gap, this paper introduces RealEngine, a novel driving simulation framework that holistically integrates 3D scene reconstruction and novel view synthesis techniques to achieve realistic and flexible closed-loop simulation in the driving context. By leveraging real-world multi-modal sensor data, RealEngine reconstructs background scenes and foreground traffic participants separately, allowing for highly diverse and realistic traffic scenarios through flexible scene composition. This synergistic fusion of scene reconstruction and view synthesis enables photorealistic rendering across multiple sensor modalities, ensuring both perceptual fidelity and geometric accuracy. Building upon this environment, RealEngine supports three essential driving simulation categories: non-reactive simulation, safety testing, and multi-agent interaction, collectively forming a reliable and comprehensive benchmark for evaluating the real-world performance of driving agents.
