Table of Contents
Fetching ...

GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting

Qijun Feng, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang

TL;DR

<3-5 sentence high-level summary> GeoGS3D tackles single-view 3D reconstruction by first generating geometry-aware, multi-view images from a single input using an orthogonal-plane decomposition within a diffusion framework. It then reconstructs a high-fidelity 3D Gaussian representation by fusing the views with epipolar attention and accelerating optimization via Gaussian Divergent Significance (GDS). The approach demonstrates strong multi-view consistency and high-quality 3D geometry on Objaverse and Google Scanned Object datasets, outperforming state-of-the-art image-to-3D baselines. This two-stage framework effectively leverages pre-trained 2D diffusion models to enable detailed 3D reconstruction with improved efficiency and geometric fidelity.</paper_summary>

Abstract

We introduce GeoGS3D, a novel two-stage framework for reconstructing detailed 3D objects from single-view images. Inspired by the success of pre-trained 2D diffusion models, our method incorporates an orthogonal plane decomposition mechanism to extract 3D geometric features from the 2D input, facilitating the generation of multi-view consistent images. During the following Gaussian Splatting, these images are fused with epipolar attention, fully utilizing the geometric correlations across views. Moreover, we propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization, significantly accelerating the reconstruction process. Extensive experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects, both qualitatively and quantitatively.

GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting

TL;DR

<3-5 sentence high-level summary> GeoGS3D tackles single-view 3D reconstruction by first generating geometry-aware, multi-view images from a single input using an orthogonal-plane decomposition within a diffusion framework. It then reconstructs a high-fidelity 3D Gaussian representation by fusing the views with epipolar attention and accelerating optimization via Gaussian Divergent Significance (GDS). The approach demonstrates strong multi-view consistency and high-quality 3D geometry on Objaverse and Google Scanned Object datasets, outperforming state-of-the-art image-to-3D baselines. This two-stage framework effectively leverages pre-trained 2D diffusion models to enable detailed 3D reconstruction with improved efficiency and geometric fidelity.</paper_summary>

Abstract

We introduce GeoGS3D, a novel two-stage framework for reconstructing detailed 3D objects from single-view images. Inspired by the success of pre-trained 2D diffusion models, our method incorporates an orthogonal plane decomposition mechanism to extract 3D geometric features from the 2D input, facilitating the generation of multi-view consistent images. During the following Gaussian Splatting, these images are fused with epipolar attention, fully utilizing the geometric correlations across views. Moreover, we propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization, significantly accelerating the reconstruction process. Extensive experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects, both qualitatively and quantitatively.
Paper Structure (31 sections, 16 equations, 6 figures, 5 tables)

This paper contains 31 sections, 16 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: GeoGS3D for single image to 3D generation: GeoGS3D can reconstruct 3D content with detailed geometry and accurate appearance from a single image.
  • Figure 2: Overview of our method. In generation stage, we extract 3D features from the single input image by decoupling the orthogonal planes, and feed them into the UNet to generate high-quality multi-view images. In reconstruction stage, we leverage the epipolar attention to fuse images with different viewpoints. We further leverage Gaussian Divergent Significance (GDS) to accelerate the adaptive density control during optimization, allowing competitive training and inference time.
  • Figure 3: Illustration of epipolar line and epipolar attention The epipolar line for a given feature point in one view is the line on which the corresponding feature point in the other view must lie, based on the known geometric transformation.
  • Figure 4: Qualitative comparisons of generated multi-view images from Objaverse dataset. The artifacts are marked with red boxes. Our method achieves better consistency and visual quality.
  • Figure 5: Qualitative comparisons for image-to-3D. The first two rows are from the Objaverse dataset, the next two are from the GSO dataset, and the final row is from an in-the-wild image. Our method demonstrates superior performance in terms of both visual fidelity and accuracy compared to existing approaches.
  • ...and 1 more figures