GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors

Xingyilang Yin; Qi Zhang; Jiahao Chang; Ying Feng; Qingnan Fan; Xi Yang; Chi-Man Pun; Huaqi Zhang; Xiaodong Cun

GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors

Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qingnan Fan, Xi Yang, Chi-Man Pun, Huaqi Zhang, Xiaodong Cun

TL;DR

GSFixer tackles artifacts in 3D Gaussian Splatting under sparse views by introducing a reference-guided video restoration model that conditions on both 2D semantic cues and 3D geometric priors from reference views. The method uses a DiT-based video diffusion backbone, augmented with fusion tokens from VGGT and DINOv2, and a reference-guided trajectory to restore artifact-free novel views and refine the underlying 3DGS representation. It also provides the DL3DV-Res benchmark to evaluate artifact restoration in 3DGS contexts. Experiments show that GSFixer achieves state-of-the-art performance in artifact restoration and sparse-view reconstruction, with strong generalization to out-of-domain scenes such as Mip-NeRF 360.

Abstract

Reconstructing 3D scenes using 3D Gaussian Splatting (3DGS) from sparse views is an ill-posed problem due to insufficient information, often resulting in noticeable artifacts. While recent approaches have sought to leverage generative priors to complete information for under-constrained regions, they struggle to generate content that remains consistent with input observations. To address this challenge, we propose GSFixer, a novel framework designed to improve the quality of 3DGS representations reconstructed from sparse inputs. The core of our approach is the reference-guided video restoration model, built upon a DiT-based video diffusion model trained on paired artifact 3DGS renders and clean frames with additional reference-based conditions. Considering the input sparse views as references, our model integrates both 2D semantic features and 3D geometric features of reference views extracted from the visual geometry foundation model, enhancing the semantic coherence and 3D consistency when fixing artifact novel views. Furthermore, considering the lack of suitable benchmarks for 3DGS artifact restoration evaluation, we present DL3DV-Res which contains artifact frames rendered using low-quality 3DGS. Extensive experiments demonstrate our GSFixer outperforms current state-of-the-art methods in 3DGS artifact restoration and sparse-view 3D reconstruction. Project page: https://github.com/GVCLab/GSFixer.

GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors

TL;DR

Abstract

GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)