Table of Contents
Fetching ...

ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction

Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Xinze Chen, Guanghong Jia, Guan Huang, Wenjun Mei

TL;DR

<3-5 sentence high-level summary> ReconDreamer-RL tackles the sim2real gap in closed-loop autonomous driving by integrating diffusion-based appearance priors with a physics-consistent scene reconstruction (ReconSimulator). It introduces three key components—the Dynamic Adversary Agent for generating corner-case traffic, and the Cousin Trajectory Generator to diversify training data—within a two-stage training regime (imitation learning followed by reinforcement learning) in a high-fidelity 3DGS environment. The framework demonstrates substantial improvements over imitation baselines and prior RL approaches, including significant reductions in collision rates and enhanced corner-case coverage. By delivering realistic sensor rendering and physically plausible dynamics, ReconDreamer-RL provides a practical pathway to more robust end-to-end autonomous driving policies.

Abstract

Reinforcement learning for training end-to-end autonomous driving models in closed-loop simulations is gaining growing attention. However, most simulation environments differ significantly from real-world conditions, creating a substantial simulation-to-reality (sim2real) gap. To bridge this gap, some approaches utilize scene reconstruction techniques to create photorealistic environments as a simulator. While this improves realistic sensor simulation, these methods are inherently constrained by the distribution of the training data, making it difficult to render high-quality sensor data for novel trajectories or corner case scenarios. Therefore, we propose ReconDreamer-RL, a framework designed to integrate video diffusion priors into scene reconstruction to aid reinforcement learning, thereby enhancing end-to-end autonomous driving training. Specifically, in ReconDreamer-RL, we introduce ReconSimulator, which combines the video diffusion prior for appearance modeling and incorporates a kinematic model for physical modeling, thereby reconstructing driving scenarios from real-world data. This narrows the sim2real gap for closed-loop evaluation and reinforcement learning. To cover more corner-case scenarios, we introduce the Dynamic Adversary Agent (DAA), which adjusts the trajectories of surrounding vehicles relative to the ego vehicle, autonomously generating corner-case traffic scenarios (e.g., cut-in). Finally, the Cousin Trajectory Generator (CTG) is proposed to address the issue of training data distribution, which is often biased toward simple straight-line movements. Experiments show that ReconDreamer-RL improves end-to-end autonomous driving training, outperforming imitation learning methods with a 5x reduction in the Collision Ratio.

ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction

TL;DR

<3-5 sentence high-level summary> ReconDreamer-RL tackles the sim2real gap in closed-loop autonomous driving by integrating diffusion-based appearance priors with a physics-consistent scene reconstruction (ReconSimulator). It introduces three key components—the Dynamic Adversary Agent for generating corner-case traffic, and the Cousin Trajectory Generator to diversify training data—within a two-stage training regime (imitation learning followed by reinforcement learning) in a high-fidelity 3DGS environment. The framework demonstrates substantial improvements over imitation baselines and prior RL approaches, including significant reductions in collision rates and enhanced corner-case coverage. By delivering realistic sensor rendering and physically plausible dynamics, ReconDreamer-RL provides a practical pathway to more robust end-to-end autonomous driving policies.

Abstract

Reinforcement learning for training end-to-end autonomous driving models in closed-loop simulations is gaining growing attention. However, most simulation environments differ significantly from real-world conditions, creating a substantial simulation-to-reality (sim2real) gap. To bridge this gap, some approaches utilize scene reconstruction techniques to create photorealistic environments as a simulator. While this improves realistic sensor simulation, these methods are inherently constrained by the distribution of the training data, making it difficult to render high-quality sensor data for novel trajectories or corner case scenarios. Therefore, we propose ReconDreamer-RL, a framework designed to integrate video diffusion priors into scene reconstruction to aid reinforcement learning, thereby enhancing end-to-end autonomous driving training. Specifically, in ReconDreamer-RL, we introduce ReconSimulator, which combines the video diffusion prior for appearance modeling and incorporates a kinematic model for physical modeling, thereby reconstructing driving scenarios from real-world data. This narrows the sim2real gap for closed-loop evaluation and reinforcement learning. To cover more corner-case scenarios, we introduce the Dynamic Adversary Agent (DAA), which adjusts the trajectories of surrounding vehicles relative to the ego vehicle, autonomously generating corner-case traffic scenarios (e.g., cut-in). Finally, the Cousin Trajectory Generator (CTG) is proposed to address the issue of training data distribution, which is often biased toward simple straight-line movements. Experiments show that ReconDreamer-RL improves end-to-end autonomous driving training, outperforming imitation learning methods with a 5x reduction in the Collision Ratio.

Paper Structure

This paper contains 56 sections, 17 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: In ReconDreamer-RL, ReconSimulator improves appearance modeling by ReconDreamer and incorporates physical modeling to reconstruct driving scenes. In the imitation learning stage, DAA generates corner-case scenario trajectories, while CTG diversifies the ego vehicle's actions and uses ReconSimulator to render sensor data for training the policy. In the reinforcement learning stage, the policy is trained in a closed-loop environment, interacting with DAA-controlled surrounding vehicles.
  • Figure 2: The process of integrating the diffusion prior for appearance modeling. During the reconstruction of driving scenes, we first render novel trajectory view videos. These rendered videos are then processed by the DriveRestorer to enhance their visual quality, and the restored results are used to further optimize the reconstruction model. This iterative process continues until the reconstruction model converges.
  • Figure 3: Examples of Dynamic Adversary Agent (DAA) controlling surrounding vehicles to simulate cut-in scenarios.
  • Figure 4: The pipeline of the DAA. DAA identifies the target vehicles based on their distances to the ego car from the BEV view, where the blue line represents the ego car’s trajectory and the red line represents the target vehicle. Then, DAA generates novel trajectories based on the specified interactive behavior. The generated trajectories are checked, and feasible ones are rendered using ReconSimulator.
  • Figure 5: Cousin Trajectory Generator (CTG) generates cousin trajectories and performs trajectory checks to eliminate unreasonable trajectories (e.g., the pink cross marks), and finally renders the corresponding sensor data in the ReconSimulator.
  • ...and 10 more figures