Table of Contents
Fetching ...

FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis

Shijie Chen, Peixi Peng

TL;DR

FreeGen tackles the challenge of free-viewpoint driving scene synthesis from a single trajectory by coupling a fast feed-forward 3D Gaussian Splatting reconstruction with a geometry-aware diffusion refinement. A geometry-conditioned diffusion module preserves structural fidelity while enabling realistic extrapolation beyond observed viewpoints, and a closed-loop co-training scheme distills generative priors back into the reconstruction path. The approach yields state-of-the-art results on off-trajectory syntheses, with improved temporal coherence and visual realism, while remaining scalable and annotation-light. Overall, FreeGen offers a practical pathway to high-fidelity, consistent free-viewpoint driving scenes suitable for closed-loop simulation and pretraining at scale.

Abstract

Closed-loop simulation and scalable pre-training for autonomous driving require synthesizing free-viewpoint driving scenes. However, existing datasets and generative pipelines rarely provide consistent off-trajectory observations, limiting large-scale evaluation and training. While recent generative models demonstrate strong visual realism, they struggle to jointly achieve interpolation consistency and extrapolation realism without per-scene optimization. To address this, we propose FreeGen, a feed-forward reconstruction-generation co-training framework for free-viewpoint driving scene synthesis. The reconstruction model provides stable geometric representations to ensure interpolation consistency, while the generation model performs geometry-aware enhancement to improve realism at unseen viewpoints. Through co-training, generative priors are distilled into the reconstruction model to improve off-trajectory rendering, and the refined geometry in turn offers stronger structural guidance for generation. Experiments demonstrate that FreeGen achieves state-of-the-art performance for free-viewpoint driving scene synthesis.

FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis

TL;DR

FreeGen tackles the challenge of free-viewpoint driving scene synthesis from a single trajectory by coupling a fast feed-forward 3D Gaussian Splatting reconstruction with a geometry-aware diffusion refinement. A geometry-conditioned diffusion module preserves structural fidelity while enabling realistic extrapolation beyond observed viewpoints, and a closed-loop co-training scheme distills generative priors back into the reconstruction path. The approach yields state-of-the-art results on off-trajectory syntheses, with improved temporal coherence and visual realism, while remaining scalable and annotation-light. Overall, FreeGen offers a practical pathway to high-fidelity, consistent free-viewpoint driving scenes suitable for closed-loop simulation and pretraining at scale.

Abstract

Closed-loop simulation and scalable pre-training for autonomous driving require synthesizing free-viewpoint driving scenes. However, existing datasets and generative pipelines rarely provide consistent off-trajectory observations, limiting large-scale evaluation and training. While recent generative models demonstrate strong visual realism, they struggle to jointly achieve interpolation consistency and extrapolation realism without per-scene optimization. To address this, we propose FreeGen, a feed-forward reconstruction-generation co-training framework for free-viewpoint driving scene synthesis. The reconstruction model provides stable geometric representations to ensure interpolation consistency, while the generation model performs geometry-aware enhancement to improve realism at unseen viewpoints. Through co-training, generative priors are distilled into the reconstruction model to improve off-trajectory rendering, and the refined geometry in turn offers stronger structural guidance for generation. Experiments demonstrate that FreeGen achieves state-of-the-art performance for free-viewpoint driving scene synthesis.

Paper Structure

This paper contains 11 sections, 6 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Complementary strengths of reconstruction and generation methods. Reconstruction maintains geometry but lacks realistic textures, generation improves realism but often distorts geometry. Our method combines both, achieving consistent and realistic results.
  • Figure 2: Overview of the proposed FreeGen. The reconstruction model encodes multi-view inputs into Gaussian features and decodes them into 3DGS representations. The rendered views and corresponding opacity maps are then used to guide the geometry-aware diffusion model, which performs fine-grained refinement. For clarity, depth maps are omitted from the illustration.
  • Figure 3: Illustration of the Co-Training strategy. FreeGen adopts a closed-loop co-training strategy between the Gaussian reconstruction model and the diffusion refinement model. Novel trajectories (Traj.) are sampled and rendered by the Gaussian model, then refined by the diffusion model to form pseudo-supervision. The refined results are fed back to the Gaussian model for reconstruction loss, while the diffusion model learns from generation loss on the original trajectory.
  • Figure 4: Qualitative Results under Spatial Viewpoint Shifts. We show generation results under lateral shifts of $1\text{m}$ and $2\text{m}$ to the left and right with six input views. Our method generates high-quality and detailed scenes while maintaining consistency across different viewpoints.
  • Figure 5: Qualitative Results of Temporal Consistency. We show three consecutive frames synthesized under a $1\text{m}$ lateral shift. For each timestamp, we present the ground-truth recorded image, the rendered views from the reconstruction model, and the refined results. The refined sequences remain temporally stable and preserve object geometry across time, highlighting the strong temporal consistency achieved by our method.
  • ...and 2 more figures