Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

Jay Zhangjie Wu; Yuxuan Zhang; Haithem Turki; Xuanchi Ren; Jun Gao; Mike Zheng Shou; Sanja Fidler; Zan Gojcic; Huan Ling

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, Huan Ling

TL;DR

Difix3D+ introduces a unified pipeline that leverages a single-step diffusion model, Difix, to both improve 3D reconstructions from NeRF and 3D Gaussian Splatting and to provide real-time post-render enhancements. By distilling improved novel views back into the 3D representation through a progressive 3D update process and applying a fast post-render refinement, the approach achieves strong multi-view consistency and perceptual quality while remaining computationally efficient. The method demonstrates notable improvements in FID and PSNR across in-the-wild and automotive driving datasets, and is compatible with both implicit and explicit 3D representations, enabling practical deployment. Overall, Difix3D+ offers a fast, diffusion-prior-based solution to persistent artifacts in 3Dnovel-view synthesis, with potential for real-time applications and scalability to large scenes.

Abstract

Neural Radiance Fields and 3D Gaussian Splatting have revolutionized 3D reconstruction and novel-view synthesis task. However, achieving photorealistic rendering from extreme novel viewpoints remains challenging, as artifacts persist across representations. In this work, we introduce Difix3D+, a novel pipeline designed to enhance 3D reconstruction and novel-view synthesis through single-step diffusion models. At the core of our approach is Difix, a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by underconstrained regions of the 3D representation. Difix serves two critical roles in our pipeline. First, it is used during the reconstruction phase to clean up pseudo-training views that are rendered from the reconstruction and then distilled back into 3D. This greatly enhances underconstrained regions and improves the overall 3D representation quality. More importantly, Difix also acts as a neural enhancer during inference, effectively removing residual artifacts arising from imperfect 3D supervision and the limited capacity of current reconstruction models. Difix3D+ is a general solution, a single model compatible with both NeRF and 3DGS representations, and it achieves an average 2$\times$ improvement in FID score over baselines while maintaining 3D consistency.

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

TL;DR

Abstract

Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)