Table of Contents
Fetching ...

FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training

Ruihong Yin, Vladimir Yugay, Yue Li, Sezer Karaoglu, Theo Gevers

TL;DR

This work proposes a multi-stage training scheme with matching-based consistency constraints imposed on the novel views without relying on pre-trained depth estimation or diffusion models, and introduces a locality preserving regularization for 3D Gaussians which removes rendering artifacts by preserving the local color structure of the scene.

Abstract

The field of novel view synthesis from images has seen rapid advancements with the introduction of Neural Radiance Fields (NeRF) and more recently with 3D Gaussian Splatting. Gaussian Splatting became widely adopted due to its efficiency and ability to render novel views accurately. While Gaussian Splatting performs well when a sufficient amount of training images are available, its unstructured explicit representation tends to overfit in scenarios with sparse input images, resulting in poor rendering performance. To address this, we present a 3D Gaussian-based novel view synthesis method using sparse input images that can accurately render the scene from the viewpoints not covered by the training images. We propose a multi-stage training scheme with matching-based consistency constraints imposed on the novel views without relying on pre-trained depth estimation or diffusion models. This is achieved by using the matches of the available training images to supervise the generation of the novel views sampled between the training frames with color, geometry, and semantic losses. In addition, we introduce a locality preserving regularization for 3D Gaussians which removes rendering artifacts by preserving the local color structure of the scene. Evaluation on synthetic and real-world datasets demonstrates competitive or superior performance of our method in few-shot novel view synthesis compared to existing state-of-the-art methods.

FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training

TL;DR

This work proposes a multi-stage training scheme with matching-based consistency constraints imposed on the novel views without relying on pre-trained depth estimation or diffusion models, and introduces a locality preserving regularization for 3D Gaussians which removes rendering artifacts by preserving the local color structure of the scene.

Abstract

The field of novel view synthesis from images has seen rapid advancements with the introduction of Neural Radiance Fields (NeRF) and more recently with 3D Gaussian Splatting. Gaussian Splatting became widely adopted due to its efficiency and ability to render novel views accurately. While Gaussian Splatting performs well when a sufficient amount of training images are available, its unstructured explicit representation tends to overfit in scenarios with sparse input images, resulting in poor rendering performance. To address this, we present a 3D Gaussian-based novel view synthesis method using sparse input images that can accurately render the scene from the viewpoints not covered by the training images. We propose a multi-stage training scheme with matching-based consistency constraints imposed on the novel views without relying on pre-trained depth estimation or diffusion models. This is achieved by using the matches of the available training images to supervise the generation of the novel views sampled between the training frames with color, geometry, and semantic losses. In addition, we introduce a locality preserving regularization for 3D Gaussians which removes rendering artifacts by preserving the local color structure of the scene. Evaluation on synthetic and real-world datasets demonstrates competitive or superior performance of our method in few-shot novel view synthesis compared to existing state-of-the-art methods.

Paper Structure

This paper contains 15 sections, 15 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: FewViewGS pipeline. Our method consists of a multi-stage training scheme of (a) pre-training, (b) intermediate, and (c) tuning stages. Top right: pre-training / tuning. At the beginning and end, Gaussians are optimized solely on the known input views, utilizing color re-rendering loss and regularization terms on total opacity and local appearance. Bottom right: intermediate. Correspondences are first extracted from the pairs of training images and projected onto the virtual sampled views. Given the projected and virtual renders, color, geometry, and semantic losses are calculated at the projected pixels in the new views.
  • Figure 2: Qualitative comparison on DTU jensen2014large and LLFF mildenhall2019local datasets. The results show that RegNeRF tends to produce blurred outcomes. 3DGS and DNGaussian introduce artifacts in the novel view. In contrast, our method generates better qualitative results.
  • Figure 3: Visualizations for unseen regions. The orange regions are not observed in the training views. Compared to FSGS zhu2025fsgs, our method generates better results and fewer artifacts.
  • Figure 4: Visualizations for predicted depth. Compared to the baseline 3DGS kerbl20233d, our method yields more accurate depth values.
  • Figure 5: Comparison of the results with our proposed multi-view alignment.
  • ...and 4 more figures