Table of Contents
Fetching ...

GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise

Xinhai Li, Huaibin Wang, Kuo-Kun Tseng

TL;DR

This paper introduces a novel text to 3D content generation framework, Gaussian Diffusion, based on Gaussian Splatting and produces more realistic renderings and represents the first comprehensive utilization of Gaussian Diffusion across the entire spectrum of 3D content generation processes.

Abstract

Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain. However, the pixel-wise rendering of NeRF and its ray marching light sampling constrain the rendering speed, impacting its utility in downstream industrial applications. Gaussian Splatting has recently shown a trend of replacing the traditional pointwise sampling technique commonly used in NeRF-based methodologies, and it is changing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework, Gaussian Diffusion, based on Gaussian Splatting and produces more realistic renderings. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian Splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian Splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian Diffusion across the entire spectrum of 3D content generation processes.

GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise

TL;DR

This paper introduces a novel text to 3D content generation framework, Gaussian Diffusion, based on Gaussian Splatting and produces more realistic renderings and represents the first comprehensive utilization of Gaussian Diffusion across the entire spectrum of 3D content generation processes.

Abstract

Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain. However, the pixel-wise rendering of NeRF and its ray marching light sampling constrain the rendering speed, impacting its utility in downstream industrial applications. Gaussian Splatting has recently shown a trend of replacing the traditional pointwise sampling technique commonly used in NeRF-based methodologies, and it is changing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework, Gaussian Diffusion, based on Gaussian Splatting and produces more realistic renderings. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian Splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian Splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian Diffusion across the entire spectrum of 3D content generation processes.
Paper Structure (20 sections, 13 equations, 8 figures, 2 tables)

This paper contains 20 sections, 13 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: GaussianDiffusion text-to-3D generation, given the text prompts 'a corgi' and 'a hamburger,'.
  • Figure 2: Comparision of convergence speed.
  • Figure 3: Structured Noise. The left portion in the figure represents the SJC method. It involves adding noise to $x_{\pi}$ to gradually transform it into a standard normal distribution $N(0,I)$, and evaluate $D(x_{\pi} + \sigma n_i; \sigma)$ through diffusion model. The right portion corresponds to our structured noise approach, which generates additional $N(0,1)$ distributions related to both pose and pixel position from the same noise source. This establishes inherent noise constraints between images generated from different viewpoints, addressing the multi-view consistency problem.
  • Figure 4: Variational Gaussian Splatting. The left portion is the SJC wang2023score method, which involves adding noise to $x_{\pi}$ to gradually transform it into a standard normal distribution $N(0, I)$, and then evaluate $D(x_{\pi} + \sigma n_i; \sigma)$ through diffusion model. On the right, leveraging the variational Gaussian splatting method involves pre-designing a Gaussian model for the parameters $\theta$. During the gradient backward, the gradient is propagated to the mean, while the variance retains the noise level introduced by the diffusion model. The objective is to learn a distribution that more accurately conforms to the correct parameter space by introducing slight variations within a defined range. The distribution points on the triangle are determined by jittering, and then the mean of the distribution is taken as the value for forward inference.
  • Figure 5: GaussianDiffusion Framework. We apply Semantic Code Sampling module seo2023let to restrict the entire 3D scene to a singular semantic identity. An optimized image, derived from Semantic Code Sampling, generates a sparse point cloud using Point-E nichol2022point. This point cloud is subsequently pose-projected into a depth map, functioning as a constraint for ControlNet zhang2023adding. Concurrently, LoRA hu2021lora is deployed for additional optimization for fine-tuning of the diffusion model. The sparse point cloud produced by Point-E acts as the initial input to Gaussian Splatting kerbl20233d. Leveraging SDS poole2022dreamfusion, the gradient of the diffusion model is conveyed to Gaussian Splatting. In order to address challenges related to multi-view consistency and the presence of artifacts such as floaters burrs or proliferative elements, we introduce Structured Noise and the Variational Gaussian Splatting method to produce realistic 3D appearance.
  • ...and 3 more figures