Table of Contents
Fetching ...

GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors

Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qingnan Fan, Xi Yang, Chi-Man Pun, Huaqi Zhang, Xiaodong Cun

TL;DR

GSFixer tackles artifacts in 3D Gaussian Splatting under sparse views by introducing a reference-guided video restoration model that conditions on both 2D semantic cues and 3D geometric priors from reference views. The method uses a DiT-based video diffusion backbone, augmented with fusion tokens from VGGT and DINOv2, and a reference-guided trajectory to restore artifact-free novel views and refine the underlying 3DGS representation. It also provides the DL3DV-Res benchmark to evaluate artifact restoration in 3DGS contexts. Experiments show that GSFixer achieves state-of-the-art performance in artifact restoration and sparse-view reconstruction, with strong generalization to out-of-domain scenes such as Mip-NeRF 360.

Abstract

Reconstructing 3D scenes using 3D Gaussian Splatting (3DGS) from sparse views is an ill-posed problem due to insufficient information, often resulting in noticeable artifacts. While recent approaches have sought to leverage generative priors to complete information for under-constrained regions, they struggle to generate content that remains consistent with input observations. To address this challenge, we propose GSFixer, a novel framework designed to improve the quality of 3DGS representations reconstructed from sparse inputs. The core of our approach is the reference-guided video restoration model, built upon a DiT-based video diffusion model trained on paired artifact 3DGS renders and clean frames with additional reference-based conditions. Considering the input sparse views as references, our model integrates both 2D semantic features and 3D geometric features of reference views extracted from the visual geometry foundation model, enhancing the semantic coherence and 3D consistency when fixing artifact novel views. Furthermore, considering the lack of suitable benchmarks for 3DGS artifact restoration evaluation, we present DL3DV-Res which contains artifact frames rendered using low-quality 3DGS. Extensive experiments demonstrate our GSFixer outperforms current state-of-the-art methods in 3DGS artifact restoration and sparse-view 3D reconstruction. Project page: https://github.com/GVCLab/GSFixer.

GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors

TL;DR

GSFixer tackles artifacts in 3D Gaussian Splatting under sparse views by introducing a reference-guided video restoration model that conditions on both 2D semantic cues and 3D geometric priors from reference views. The method uses a DiT-based video diffusion backbone, augmented with fusion tokens from VGGT and DINOv2, and a reference-guided trajectory to restore artifact-free novel views and refine the underlying 3DGS representation. It also provides the DL3DV-Res benchmark to evaluate artifact restoration in 3DGS contexts. Experiments show that GSFixer achieves state-of-the-art performance in artifact restoration and sparse-view reconstruction, with strong generalization to out-of-domain scenes such as Mip-NeRF 360.

Abstract

Reconstructing 3D scenes using 3D Gaussian Splatting (3DGS) from sparse views is an ill-posed problem due to insufficient information, often resulting in noticeable artifacts. While recent approaches have sought to leverage generative priors to complete information for under-constrained regions, they struggle to generate content that remains consistent with input observations. To address this challenge, we propose GSFixer, a novel framework designed to improve the quality of 3DGS representations reconstructed from sparse inputs. The core of our approach is the reference-guided video restoration model, built upon a DiT-based video diffusion model trained on paired artifact 3DGS renders and clean frames with additional reference-based conditions. Considering the input sparse views as references, our model integrates both 2D semantic features and 3D geometric features of reference views extracted from the visual geometry foundation model, enhancing the semantic coherence and 3D consistency when fixing artifact novel views. Furthermore, considering the lack of suitable benchmarks for 3DGS artifact restoration evaluation, we present DL3DV-Res which contains artifact frames rendered using low-quality 3DGS. Extensive experiments demonstrate our GSFixer outperforms current state-of-the-art methods in 3DGS artifact restoration and sparse-view 3D reconstruction. Project page: https://github.com/GVCLab/GSFixer.

Paper Structure

This paper contains 23 sections, 9 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: We introduce GSFixer, a framework capable of improving 3DGS in both artifact restoration (top) and 3D reconstruction (bottom) under sparse-view settings. Recent generative methods struggle with maintaining consistency between generated and input views. GSFixer guides the video diffusion model conditioned on both 3D and 2D signals to enhance consistency in novel view restoration, thereby improving 3D reconstruction quality.
  • Figure 2: Pipeline of GSFixer. Given sparse-view images and their corresponding low-quality 3DGS representation, we render artifact-prone novel views between two reference views along a reference-guided trajectory. These novel views are fed into reference-guided video restoration model to correct artifacts, and the fixed novel views are then distilled back into the 3DGS representation to improve its quality. The restoration network is finetuned from CogVideoX and trained on paired artifact-ridden 3DGS renders and ground truth frames. It is additionally conditioned on 3D geometric tokens and 2D semantic tokens extracted from the reference views using pretrained VGGT and DINOv2 encoder, respectively.
  • Figure 3: Illustration of different trajectories. (a) Interpolation trajectory: blue curve. (b) Ellipse trajectory: green curve. (c) Reference-guided trajectory: orange and green curve.
  • Figure 4: Qualitative comparison on DL3DV-Res Benchmark. We compare 3DGS artifact restoration quality of the existing generative methods.
  • Figure 5: Qualitative comparison on DL3DV-Benchmark. We compare the novel view with baselines rendering quality using 3, 6, and 9 input views.
  • ...and 4 more figures