Table of Contents
Fetching ...

GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction

Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofeng Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng

TL;DR

GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an unconditional diffusion model learns to generate 3D objects represented by sets of GS ellipsoids.

Abstract

We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an unconditional diffusion model. This model learns to generate 3D objects represented by sets of GS ellipsoids. With these strong generative 3D priors, though learning unconditionally, the diffusion model is ready for view-guided reconstruction without further model fine-tuning. This is achieved by propagating fine-grained 2D features through the efficient yet flexible splatting function and the guided denoising sampling process. In addition, a 2D diffusion model is further employed to enhance rendering fidelity, and improve reconstructed GS quality by polishing and re-using the rendered images. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views. Experiments on the challenging real-world CO3D dataset demonstrate the superiority of our approach. Project page: https://yxmu.foo/GSD/

GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction

TL;DR

GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an unconditional diffusion model learns to generate 3D objects represented by sets of GS ellipsoids.

Abstract

We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an unconditional diffusion model. This model learns to generate 3D objects represented by sets of GS ellipsoids. With these strong generative 3D priors, though learning unconditionally, the diffusion model is ready for view-guided reconstruction without further model fine-tuning. This is achieved by propagating fine-grained 2D features through the efficient yet flexible splatting function and the guided denoising sampling process. In addition, a 2D diffusion model is further employed to enhance rendering fidelity, and improve reconstructed GS quality by polishing and re-using the rendered images. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views. Experiments on the challenging real-world CO3D dataset demonstrate the superiority of our approach. Project page: https://yxmu.foo/GSD/
Paper Structure (13 sections, 7 equations, 11 figures, 4 tables)

This paper contains 13 sections, 7 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: A illustration of our View-Guided Gaussian Splatting Diffusion framework for single-view 3D reconstruction. It works by progressively denoising a randomly initialized set of Gaussian Splatting (GS) ellipsoids with continuous guidance from the discrepancies between the input and rendered images. The gray arrow represents the splatting-based GS rendering, while the orange arrow depicts the backpropagation of guidance gradients. The diffusion model built directly upon the GS representation in our context provides explicit geometry information. The view-guided sampling takes the advantage of splatting function to faithfully yet efficiently obtain fine-grained features from the given view.
  • Figure 2: Unconditional Generation of GS DiT on the Hydrant dataset. Ten distinct samples are generated from our unconditional diffusion model which is trained on more than 500 hydrant scenes using the GS representation. Our diffusion model shows an appealing ability on modeling the generative priors of 3D objects.
  • Figure 3: Approach Overview. (a) An unconditional diffusion model is trained on objects represented by N GS ellipsoids (N=1024). After training, the GS ellipsoids of an object can be generated through $T$ denoising steps (\ref{['subsec:GSd']}). (b) At inference time, we apply view-space loss guidance at each denoising step. The gray arrow represents the splatting-based GS rendering, while the orange arrow depicts the backpropagation of guidance gradients. The GS object rendering through the splatting function $f_{\text{splat}}$ from input-view is compared with the given image using $\mathcal{L}_{img}$, and the gradients backpropagate to the diffusion model for adjusting the sampling process (\ref{['subsec:guidedsample']}). (c) A 2D diffusion model is employed to enhance the fidelity of rendered views from reconstructed GS $x_0$. (d) The refined synthetic view images are then re-used to improve GS reconstruction quality in an alternating iterative enhancement manner (\ref{['subsec:3d2d']}). We obtain the final reconstructed GS object $x_0$ from the last run of GS diffusion.
  • Figure 4: View synthesis qualitative results from single-view reconstruction. We show novel view synthesis results given the object reconstructed from single-view input on hydrant, bench, donut, and teddy bear. Our method takes the raw input view with an object mask with various resolutions as input. Notably, our novel views are rendered from the GS in real-time once we obtain this reconstructed 3D representation.
  • Figure 5: Additional view synthesis qualitative results from single-view reconstruction. We show three novel views rendered from the object reconstructed from single-view input shown on the left.
  • ...and 6 more figures