Table of Contents
Fetching ...

Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis

Chen Zhao, Xuan Wang, Tong Zhang, Saqib Javed, Mathieu Salzmann

TL;DR

This work addresses overfitting in 3D Gaussian Splatting (3DGS) for few-shot novel view synthesis by introducing Self-Ensembling Gaussian Splatting (SE-GS). SE-GS trains a non-perturbed Σ-model alongside a Δ-model whose parameters are perturbed in an uncertainty-guided manner, creating diverse yet reliable supervision from pseudo-views; a photometric regularization leverages this ensemble to improve generalization. Across LLFF, DTU, Mip-NeRF360, and MVImgNet, SE-GS achieves consistent improvements in PSNR, SSIM, and LPIPS under sparse-view conditions, outperforming prior methods with efficient training. The approach offers a practical, self-supervised regularization strategy for radiance-field models, enabling robust NVS with limited input views and broad applicability to 3D scene synthesis tasks.

Abstract

3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness in novel view synthesis (NVS). However, 3DGS tends to overfit when trained with sparse views, limiting its generalization to novel viewpoints. In this paper, we address this overfitting issue by introducing Self-Ensembling Gaussian Splatting (SE-GS). We achieve self-ensembling by incorporating an uncertainty-aware perturbation strategy during training. A $\mathbfΔ$-model and a $\mathbfΣ$-model are jointly trained on the available images. The $\mathbfΔ$-model is dynamically perturbed based on rendering uncertainty across training steps, generating diverse perturbed models with negligible computational overhead. Discrepancies between the $\mathbfΣ$-model and these perturbed models are minimized throughout training, forming a robust ensemble of 3DGS models. This ensemble, represented by the $\mathbfΣ$-model, is then used to generate novel-view images during inference. Experimental results on the LLFF, Mip-NeRF360, DTU, and MVImgNet datasets demonstrate that our approach enhances NVS quality under few-shot training conditions, outperforming existing state-of-the-art methods. The code is released at: https://sailor-z.github.io/projects/SEGS.html.

Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis

TL;DR

This work addresses overfitting in 3D Gaussian Splatting (3DGS) for few-shot novel view synthesis by introducing Self-Ensembling Gaussian Splatting (SE-GS). SE-GS trains a non-perturbed Σ-model alongside a Δ-model whose parameters are perturbed in an uncertainty-guided manner, creating diverse yet reliable supervision from pseudo-views; a photometric regularization leverages this ensemble to improve generalization. Across LLFF, DTU, Mip-NeRF360, and MVImgNet, SE-GS achieves consistent improvements in PSNR, SSIM, and LPIPS under sparse-view conditions, outperforming prior methods with efficient training. The approach offers a practical, self-supervised regularization strategy for radiance-field models, enabling robust NVS with limited input views and broad applicability to 3D scene synthesis tasks.

Abstract

3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness in novel view synthesis (NVS). However, 3DGS tends to overfit when trained with sparse views, limiting its generalization to novel viewpoints. In this paper, we address this overfitting issue by introducing Self-Ensembling Gaussian Splatting (SE-GS). We achieve self-ensembling by incorporating an uncertainty-aware perturbation strategy during training. A -model and a -model are jointly trained on the available images. The -model is dynamically perturbed based on rendering uncertainty across training steps, generating diverse perturbed models with negligible computational overhead. Discrepancies between the -model and these perturbed models are minimized throughout training, forming a robust ensemble of 3DGS models. This ensemble, represented by the -model, is then used to generate novel-view images during inference. Experimental results on the LLFF, Mip-NeRF360, DTU, and MVImgNet datasets demonstrate that our approach enhances NVS quality under few-shot training conditions, outperforming existing state-of-the-art methods. The code is released at: https://sailor-z.github.io/projects/SEGS.html.

Paper Structure

This paper contains 13 sections, 14 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Qualitative results of our SE-GS and state-of-the-art approaches. The models are trained on sparse views and the images rendered from novel views are shown. As highlighted in the zoomed-in patches, our SE-GS captures finer details and produces fewer artifacts for novel views when trained on few-shot images.
  • Figure 2: Overfitting in 3D Gaussian Splatting with sparse training views. (a) and (b) illustrate the performance of 3DGS on training and testing views, respectively. Each curve represents the PSNR values across training iterations.
  • Figure 3: Pipeline of the presented SE-GS. We tackle the overfitting problem in sparse-view scenarios by incorporating a self-ensembling mechanism into 3DGS. We jointly train a $\mathbf{\Delta}$-model and a $\mathbf{\Sigma}$-model. During training, we store pseudo-view renderings of the $\mathbf{\Delta}$-model in buffers, from which we compute pixel-level uncertainties. The Gaussians of the $\mathbf{\Delta}$-model overlapping the pixels with high uncertainties are perturbed, as highlighted as red ellipses, which leads to a perturbed model. We then achieve self-ensembling by penalizing the discrepancies between the $\mathbf{\Sigma}$-model and the perturbed models. During inference, the resulting ensemble, the $\mathbf{\Sigma}$-model, is used for novel view synthesis.
  • Figure 4: Buffer update during training. For each sampled pseudo view, we dynamically update the buffer storing the images rendered at different training steps. For instance, at training step $t_{T}$, the oldest image $\mathbf{I}_{T-S}$ in the buffer is popped, and the new image $\mathbf{I}_{T}$ is pushed into the buffer. An uncertainty map $\mathbf{U}^{t_{T}}$ is computed based on the current buffer, which is then employed to determine perturbation that results in a new 3DGS model.
  • Figure 5: Qualitative results. The methods are trained on sparse views and the renderings of novel views are illustrated. The images are from the DTU and MVImgNet datasets.
  • ...and 6 more figures