Table of Contents
Fetching ...

GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian Splats

Sangeek Hyun, Jae-Pil Heo

TL;DR

This paper introduces a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians and achieves a significantly faster rendering speed compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability.

Abstract

Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a naïve generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively. To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene. Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability. Project page: https://hse1032.github.io/gsgan.

GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian Splats

TL;DR

This paper introduces a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians and achieves a significantly faster rendering speed compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability.

Abstract

Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a naïve generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively. To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene. Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability. Project page: https://hse1032.github.io/gsgan.
Paper Structure (39 sections, 13 equations, 15 figures, 3 tables)

This paper contains 39 sections, 13 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Generated examples from the proposed method (FFHQ-512, AFHQ-Cat-512). Ours synthesize multi-view consistent images with a significantly faster rendering speed by leveraging 3D Gaussian representation. We represent a 3D scene as a composite of hierarchical Gaussians, where each level of Gaussian depicts coarse and fine details corresponding to its level. To visualize the effects of individual Gaussian, the right-most images are rendered by reducing the scale of Gaussians.
  • Figure 2: Illustration and examples of hierarchical Gaussian representation. (a) We parameterize the finer-level Gaussians by the parameters of coarser-level counterparts for regularizing the scale and position of synthesized Gaussians. (b) Example of synthesized Gaussians across multiple hierarchy levels. Gaussians represent coarse or fine details according to its hierarchy level.
  • Figure 3: The architecture of the generator and its block. (a) Generator synthesizes the multiple-level of anchors and Gaussians, which contains the residual parameters $\hat{A}^{l}$ and $\hat{G}^l$. Anchors are utilized to regularize the finer-level Gaussians, while Gaussians are used for actual rendering. After generating these parameters, we combine them with anchors from previous level $A^{l-1}$ by $\texttt{densify}$ operation, as defined in eqn. \ref{['eqn:locality']}, \ref{['eqn:scale diff']}, \ref{['eqn:residual params']} (green arrow). (b) The generator consists of stacks of blocks, each of which is a sequence of attention and MLP layers. The latent code $z$ is conditioned on the generator through AdaIN and layerscale, where the modulation and scaling parameters are derived from style code $w$.
  • Figure 4: Qualitative results of the proposed method with truncation psi ($\psi=0.7$).
  • Figure 5: Qualitative comparison with 3D consistent methods with truncation psi ($\psi=0.7$).
  • ...and 10 more figures