Table of Contents
Fetching ...

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

Guosheng Zhao, Xiaofeng Wang, Chaojun Ni, Zheng Zhu, Wenkang Qin, Guan Huang, Xingang Wang

TL;DR

ReconDreamer++ tackles the domain gap between generative and real observations in driving-scene reconstruction by introducing the Novel Trajectory Deformable Network (NTDNet) to learn spatial deformations and by modeling the ground surface with fixed 3D Gaussian priors. This three-component scene representation (ground, non-ground background, dynamic objects) paired with depth-guided optimization and selective deformation for novel trajectories yields substantial improvements over prior methods, especially on novel viewpoints. Across Waymo, nuScenes, PandaSet, and EUVS, ReconDreamer++ achieves higher fidelity in both foreground and road-surface rendering, with notable gains in NTA-IoU, NTL-IoU, and FID, demonstrating robustness to large trajectory shifts. The work advances closed-loop autonomous driving simulation by enhancing geometric consistency of structured elements and reducing cumulative errors between reconstruction and generative models, enabling more reliable scene synthesis for planning and evaluation.

Abstract

Combining reconstruction models with generative models has emerged as a promising paradigm for closed-loop simulation in autonomous driving. For example, ReconDreamer has demonstrated remarkable success in rendering large-scale maneuvers. However, a significant gap remains between the generated data and real-world sensor observations, particularly in terms of fidelity for structured elements, such as the ground surface. To address these challenges, we propose ReconDreamer++, an enhanced framework that significantly improves the overall rendering quality by mitigating the domain gap and refining the representation of the ground surface. Specifically, ReconDreamer++ introduces the Novel Trajectory Deformable Network (NTDNet), which leverages learnable spatial deformation mechanisms to bridge the domain gap between synthesized novel views and original sensor observations. Moreover, for structured elements such as the ground surface, we preserve geometric prior knowledge in 3D Gaussians, and the optimization process focuses on refining appearance attributes while preserving the underlying geometric structure. Experimental evaluations conducted on multiple datasets (Waymo, nuScenes, PandaSet, and EUVS) confirm the superior performance of ReconDreamer++. Specifically, on Waymo, ReconDreamer++ achieves performance comparable to Street Gaussians for the original trajectory while significantly outperforming ReconDreamer on novel trajectories. In particular, it achieves substantial improvements, including a 6.1% increase in NTA-IoU, a 23. 0% improvement in FID, and a remarkable 4.5% gain in the ground surface metric NTL-IoU, highlighting its effectiveness in accurately reconstructing structured elements such as the road surface.

ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation

TL;DR

ReconDreamer++ tackles the domain gap between generative and real observations in driving-scene reconstruction by introducing the Novel Trajectory Deformable Network (NTDNet) to learn spatial deformations and by modeling the ground surface with fixed 3D Gaussian priors. This three-component scene representation (ground, non-ground background, dynamic objects) paired with depth-guided optimization and selective deformation for novel trajectories yields substantial improvements over prior methods, especially on novel viewpoints. Across Waymo, nuScenes, PandaSet, and EUVS, ReconDreamer++ achieves higher fidelity in both foreground and road-surface rendering, with notable gains in NTA-IoU, NTL-IoU, and FID, demonstrating robustness to large trajectory shifts. The work advances closed-loop autonomous driving simulation by enhancing geometric consistency of structured elements and reducing cumulative errors between reconstruction and generative models, enabling more reliable scene synthesis for planning and evaluation.

Abstract

Combining reconstruction models with generative models has emerged as a promising paradigm for closed-loop simulation in autonomous driving. For example, ReconDreamer has demonstrated remarkable success in rendering large-scale maneuvers. However, a significant gap remains between the generated data and real-world sensor observations, particularly in terms of fidelity for structured elements, such as the ground surface. To address these challenges, we propose ReconDreamer++, an enhanced framework that significantly improves the overall rendering quality by mitigating the domain gap and refining the representation of the ground surface. Specifically, ReconDreamer++ introduces the Novel Trajectory Deformable Network (NTDNet), which leverages learnable spatial deformation mechanisms to bridge the domain gap between synthesized novel views and original sensor observations. Moreover, for structured elements such as the ground surface, we preserve geometric prior knowledge in 3D Gaussians, and the optimization process focuses on refining appearance attributes while preserving the underlying geometric structure. Experimental evaluations conducted on multiple datasets (Waymo, nuScenes, PandaSet, and EUVS) confirm the superior performance of ReconDreamer++. Specifically, on Waymo, ReconDreamer++ achieves performance comparable to Street Gaussians for the original trajectory while significantly outperforming ReconDreamer on novel trajectories. In particular, it achieves substantial improvements, including a 6.1% increase in NTA-IoU, a 23. 0% improvement in FID, and a remarkable 4.5% gain in the ground surface metric NTL-IoU, highlighting its effectiveness in accurately reconstructing structured elements such as the road surface.

Paper Structure

This paper contains 17 sections, 7 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Comparison of ReconDreamer++ with SOTA methods, Street Gaussians streetgaussian and ReconDreamer recondreamer, on original and novel trajectories. Left: ReconDreamer++ demonstrates superior rendering performance for both vehicle foregrounds and road surfaces compared to existing SOTA methods. Right: ReconDreamer++ significantly improves performance on novel trajectories while maintaining high rendering quality on the original trajectory.
  • Figure 2: The overall framework of ReconDreamer++. The driving scene is decomposed into three components: the ground surface, non-ground static background, and dynamic objects. For camera poses on the original trajectory, rendering is performed directly by skipping NTDNet. For camera poses on novel trajectories, the Gaussian parameters are refined through NTDNet to perform rendering.
  • Figure 3: Qualitative comparisons of different trajectory renderings on Waymo waymo. The orange boxes highlight that ReconDreamer++ significantly enhances the rendering quality across various SOTA methods freevsstreetgaussiandrivedreamer4drecondreamer.
  • Figure 4: Ablation studies on the depth loss, ground model and NTDNet. When none of these components are employed, the model corresponds to ReconDreamer. In contrast, when all of these components are integrated, the model represents ReconDreamer++. This highlights the incremental contributions of each component in enhancing the overall performance.
  • Figure 5: Qualitative comparisons of different trajectory renderings on nuScenes nuscenes. The orange boxes highlight that ReconDreamer++ significantly enhances the rendering quality with Street Gaussians streetgaussian and ReconDreamer recondreamer.
  • ...and 2 more figures