Bootstrap-GS: Self-Supervised Augmentation for High-Fidelity Gaussian Splatting

Yifei Gao; Kerui Ren; Jie Ou; Lei Wang; Jiaji Wu; Jun Cheng

Bootstrap-GS: Self-Supervised Augmentation for High-Fidelity Gaussian Splatting

Yifei Gao, Kerui Ren, Jie Ou, Lei Wang, Jiaji Wu, Jun Cheng

TL;DR

Bootstrap-GS tackles the ill-posed nature of 3D reconstruction in Gaussian Splatting by addressing training sampling deficiency through a self-supervised bootstrapping framework. It synthesizes pseudo-ground-truth novel-view renderings from partially reconstructed scenes, regenerates them with a diffusion model, and reintegrates them into training via a hybrid bootstrapping loss that averages over multiple bootstrap views defined by time-step schedules $T_s=[t_1,...,t_{n\times s_b}]$. The approach includes selective region modification, multi-view consistency through controlled Gaussian primitive cloning, diffusion-variance control, and image-to-image finetuning strategies, achieving quantitative gains in PSNR, SSIM, and LPIPS while reducing the number and volume of Gaussians. It is plug-and-play and broadly applicable to Gaussian-Splatting-based methods, enabling better performance on real-world datasets and large-scale indoor scenes with improved novelty-view fidelity and artifact suppression.

Abstract

Recent advancements in 3D Gaussian Splatting (3D-GS) have established new benchmarks for rendering quality and efficiency in 3D reconstruction. However, 3D-GS faces critical limitations when generating novel views that significantly deviate from those encountered during training. Moreover, issues such as dilation and aliasing arise during zoom operations. These challenges stem from a fundamental issue: training sampling deficiency. In this paper, we introduce a bootstrapping framework to address this problem. Our approach synthesizes pseudo-ground truth from novel views that align with the limited training set and reintegrates these synthesized views into the training pipeline. Experimental results demonstrate that our bootstrapping technique not only reduces artifacts but also improves quantitative metrics. Furthermore, our technique is highly adaptable, allowing various Gaussian-based method to benefit from its integration.

Bootstrap-GS: Self-Supervised Augmentation for High-Fidelity Gaussian Splatting

TL;DR

. The approach includes selective region modification, multi-view consistency through controlled Gaussian primitive cloning, diffusion-variance control, and image-to-image finetuning strategies, achieving quantitative gains in PSNR, SSIM, and LPIPS while reducing the number and volume of Gaussians. It is plug-and-play and broadly applicable to Gaussian-Splatting-based methods, enabling better performance on real-world datasets and large-scale indoor scenes with improved novelty-view fidelity and artifact suppression.

Abstract

Paper Structure (43 sections, 7 equations, 6 figures, 17 tables)

This paper contains 43 sections, 7 equations, 6 figures, 17 tables.

Introduction
Related Work
Novel View Synthesis
Diffusion-based Sparse View Reconstrcution
Preliminaries
3D Gaussian Splatting
Diffusion Model
Method
Motivation and Challenge
Selective Region Modification.
Multi-view Consistency.
Bootstrap Design
Overall Diffusion Variance Control.
Bootstrap Pipeline.
Diffusion Finetuning Strategies.
...and 28 more sections

Figures (6)

Figure 1: By addressing the common issue of training sampling deficiency in 3D reconstruction, our bootstrap technique significantly reduces artifacts in novel-view renderings and enables 3D-GS to render superior results with clear and more structured point clouds.
Figure 2: Overall pipeline. (a) To simulate the artifacts present in novel-view renderings, we first train a 3D-GS model using only half of the training set with limited time steps and then render the remaining half. This process is then repeated for the other half of the dataset. Finally, we obtain the fine-tuning data for diffusion models, where the renderings serve as $x_t$ and the ground truth as $x_0$. (b) For each training camera, we bootstrap several novel-view cameras and then acquire their corresponding renderings. After diffusion regeneration, these renderings are reintegrated into the training process, where multiple bootstrap cameras, along with a single training camera, are used to compute the bootstrapping loss for each training iteration.
Figure 3: 3D-GS cloning process. Only the gradients of Gaussian primitives aligned in nearly one direction have the potential to exceed the gradient threshold required for triggering further cloning.
Figure 4: Main comparisons. Our bootstrapping pipeline successfully assisted the original baseline in denoising, enhancing details, filling in gaps, restoring distortions, and eliminating high-noise Gaussian primitives in novel views.
Figure 5: Performance comparison on Tanks&Temples datasets with extended training time. While 3D-GS struggles to make further progress and even experiences a decline in performance, bootstrapping consistently enhances performance. The iterations are measured relative to the training process of 3D-GS.
...and 1 more figures

Bootstrap-GS: Self-Supervised Augmentation for High-Fidelity Gaussian Splatting

TL;DR

Abstract

Bootstrap-GS: Self-Supervised Augmentation for High-Fidelity Gaussian Splatting

Authors

TL;DR

Abstract

Table of Contents

Figures (6)