Table of Contents
Fetching ...

GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting

Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

TL;DR

GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images, is proposed and evaluated on several challenging datasets, achieving superior performance from only four views and significantly outperforming previous SOTA methods.

Abstract

Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination, which explicitly inject structure priors into the initial optimization process to help build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. We further design a COLMAP-free variant, where pre-given accurate camera poses are not required, which achieves competitive quality and facilitates wider applications. GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images, achieving superior performance from only four views and significantly outperforming previous SOTA methods. Our demo is available at https://gaussianobject.github.io/, and the code has been released at https://github.com/GaussianObject/GaussianObject.

GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting

TL;DR

GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images, is proposed and evaluated on several challenging datasets, achieving superior performance from only four views and significantly outperforming previous SOTA methods.

Abstract

Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination, which explicitly inject structure priors into the initial optimization process to help build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. We further design a COLMAP-free variant, where pre-given accurate camera poses are not required, which achieves competitive quality and facilitates wider applications. GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images, achieving superior performance from only four views and significantly outperforming previous SOTA methods. Our demo is available at https://gaussianobject.github.io/, and the code has been released at https://github.com/GaussianObject/GaussianObject.
Paper Structure (36 sections, 13 equations, 19 figures, 20 tables, 1 algorithm)

This paper contains 36 sections, 13 equations, 19 figures, 20 tables, 1 algorithm.

Figures (19)

  • Figure 1: Overview of GaussianObject. (a) We initialize 3D Gaussians by constructing a visual hull with camera parameters and masked images, which are optimized with $\mathcal{L}_{\text{ref}}$ and refined through floater elimination. (b) We use a novel 'leave-one-out' strategy and add 3D noise to Gaussians to generate corrupted Gaussian renderings. These renderings, paired with their corresponding reference images, facilitate the training of the Gaussian repair model employing $\mathcal{L}_{\text{tune}}$. For details please refer to Fig. \ref{['fig: repair_model illustration']}. (c) Once trained, the Gaussian repair model is frozen and used to correct views that need to be rectified. These views are identified through distance-aware sampling. The repaired images and reference images are used to further optimize 3D Gaussians with $\mathcal{L}_{\text{rep}}$ and $\mathcal{L}_{\text{ref}}$.
  • Figure 2: Illustration of Gaussian repair model setup. First, we add Gaussian noise $\epsilon$ to a reference image $x^{\text{ref}}$ to form a noisy image. Next, this noisy image along with $x^{\text{ref}}$'s corresponding degraded image $x^\prime$ are passed to a pre-trained fixed ControlNet with learnable LoRA layers to predict a noise distribution $\epsilon_{\theta}$. We use the differences among $\epsilon$ and $\epsilon_{\theta}$ to fine-tune the parameters in LoRA layers.
  • Figure 3: Illustration of our distance-aware sampling. Blue and red indicate the reference and repair path, respectively.
  • Figure 4: Qualitative examples on the MipNeRF360 and OmniObject3D dataset with 4 input views. Many methods fail to reach a coherent 3D representation, resulting in floaters and disjoint pixel patches. A pure white image indicates a total miss of the object by the corresponding method, usually caused by overfitting the input images.
  • Figure 5: Qualitative results on the OpenIllumination dataset. Although ZeroRF shows competitive PSNR and SSIM, its renderings often appear blurred. While GaussianObject outperforms in restoring fine details, achieving a significant perceptual quality advantage.
  • ...and 14 more figures