Table of Contents
Fetching ...

AugGS: Self-augmented Gaussians with Structural Masks for Sparse-view 3D Reconstruction

Bi'an Du, Lingbei Meng, Wei Hu

TL;DR

This work tackles sparse-view 3D reconstruction by introducing a self-augmented two-stage Gaussian splatting framework. It combines a coarse-to-fine Gaussian model with perceptual data augmentation via a fine-tuned 2D diffusion prior and integrates structure-aware masks to maintain geometry under sparse observations. The approach achieves state-of-the-art perceptual and multi-view consistency on benchmarks like MipNeRF360, OmniObject3D, and OpenIllumination, while notably improving training and inference efficiency. The method demonstrates practical impact by enabling high-fidelity 3D reconstructions from few views with reduced computational demands.

Abstract

Sparse-view 3D reconstruction is a major challenge in computer vision, aiming to create complete three-dimensional models from limited viewing angles. Key obstacles include: 1) a small number of input images with inconsistent information; 2) dependence on input image quality; and 3) large model parameter sizes. To tackle these issues, we propose a self-augmented two-stage Gaussian splatting framework enhanced with structural masks for sparse-view 3D reconstruction. Initially, our method generates a basic 3D Gaussian representation from sparse inputs and renders multi-view images. We then fine-tune a pre-trained 2D diffusion model to enhance these images, using them as augmented data to further optimize the 3D Gaussians. Additionally, a structural masking strategy during training enhances the model's robustness to sparse inputs and noise. Experiments on benchmarks like MipNeRF360, OmniObject3D, and OpenIllumination demonstrate that our approach achieves state-of-the-art performance in perceptual quality and multi-view consistency with sparse inputs.

AugGS: Self-augmented Gaussians with Structural Masks for Sparse-view 3D Reconstruction

TL;DR

This work tackles sparse-view 3D reconstruction by introducing a self-augmented two-stage Gaussian splatting framework. It combines a coarse-to-fine Gaussian model with perceptual data augmentation via a fine-tuned 2D diffusion prior and integrates structure-aware masks to maintain geometry under sparse observations. The approach achieves state-of-the-art perceptual and multi-view consistency on benchmarks like MipNeRF360, OmniObject3D, and OpenIllumination, while notably improving training and inference efficiency. The method demonstrates practical impact by enabling high-fidelity 3D reconstructions from few views with reduced computational demands.

Abstract

Sparse-view 3D reconstruction is a major challenge in computer vision, aiming to create complete three-dimensional models from limited viewing angles. Key obstacles include: 1) a small number of input images with inconsistent information; 2) dependence on input image quality; and 3) large model parameter sizes. To tackle these issues, we propose a self-augmented two-stage Gaussian splatting framework enhanced with structural masks for sparse-view 3D reconstruction. Initially, our method generates a basic 3D Gaussian representation from sparse inputs and renders multi-view images. We then fine-tune a pre-trained 2D diffusion model to enhance these images, using them as augmented data to further optimize the 3D Gaussians. Additionally, a structural masking strategy during training enhances the model's robustness to sparse inputs and noise. Experiments on benchmarks like MipNeRF360, OmniObject3D, and OpenIllumination demonstrate that our approach achieves state-of-the-art performance in perceptual quality and multi-view consistency with sparse inputs.
Paper Structure (17 sections, 4 equations, 6 figures, 3 tables)

This paper contains 17 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Our method enables high-quality 3D reconstruction of sparse-view scenes with self-augmented Gaussian splatting, surpassing the current SOTA methods in both qualitative and quantitative aspects for sparse view 3D reconstructions.
  • Figure 2: The overall architecture of our self-augmented Gaussian splatting method. We first create a coarse 3D Gaussian model from sparse-view images, generating a coarse point cloud and renderings from novel views. Multi-view renders and the 2D prior enhance perceptual quality, with structural masks integrated into the two-stage Gaussian process.
  • Figure 3: Qualitative examples on the MipNeRF360 and OmniObject3D dataset with 4 input views.
  • Figure 4: Comparative analysis of PSNR metrics for 4View and 9View configurations across different objects and Gaussian iteration processes. Note: 'K', 'G', and 'B' represent objects Kitchen, Garden, and Bonsai, respectively. 'C' refers to the Coarse Gaussian iteration process, while 'F' denotes the Fine Gaussian iteration process.
  • Figure 5: Ablation study on different augmentation strategies. “Aug” denotes for augmentation, “PV” denotes for perceptual view augmentation and "M" for mask augmentation.
  • ...and 1 more figures