GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu; Congrong Xu; Binbin Huang; Andreas Geiger; Anpei Chen

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

Sibo Wu, Congrong Xu, Binbin Huang, Andreas Geiger, Anpei Chen

TL;DR

GenFusion addresses the mismatch between dense 3D reconstruction and single-view generation by introducing a reconstruction-driven video diffusion model and a cyclic fusion loop that uses generated content to regularize reconstruction. It leverages masked 3D reconstruction to create artifact-prone training data, depth-aware RGB-D VAE conditioning, and a diffusion-guided feedback mechanism to expand viewpoint coverage and mitigate view-saturation. The method demonstrates improved sparse-view view synthesis, robust extrapolation, and scene completion across diverse datasets, highlighting a practical path to artifact-free 3D asset generation and scalable content augmentation using video priors. Overall, GenFusion offers a principled framework to integrate reconstruction and generation for more reliable and versatile 3D scene synthesis.

Abstract

Recently, 3D reconstruction and generation have demonstrated impressive novel view synthesis results, achieving high fidelity and efficiency. However, a notable conditioning gap can be observed between these two fields, e.g., scalable 3D scene reconstruction often requires densely captured views, whereas 3D generation typically relies on a single or no input view, which significantly limits their applications. We found that the source of this phenomenon lies in the misalignment between 3D constraints and generative priors. To address this problem, we propose a reconstruction-driven video diffusion model that learns to condition video frames on artifact-prone RGB-D renderings. Moreover, we propose a cyclical fusion pipeline that iteratively adds restoration frames from the generative model to the training set, enabling progressive expansion and addressing the viewpoint saturation limitations seen in previous reconstruction and generation pipelines. Our evaluation, including view synthesis from sparse view and masked input, validates the effectiveness of our approach. More details at https://genfusion.sibowu.com.

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

TL;DR

Abstract

GenFusion: Closing the Loop between Reconstruction and Generation via Videos

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)