Table of Contents
Fetching ...

CGS-GAN: 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis

Florian Barthel, Wieland Morgenstern, Paul Hinzer, Anna Hilsmann, Peter Eisert

TL;DR

CGS-GAN tackles the challenge of 3D-consistent, high-resolution human head synthesis without view-conditioning. It combines a memory-efficient Gaussian-splatting generator with a lightweight multi-view regularization and a discriminator-based camera conditioning strategy, enabling consistent rendering from arbitrary viewpoints up to $2048^2$. The authors also curate FFHQ-Clean, a high-quality FFHQ-derived dataset designed to reduce view-dependent artifacts and occlusions. Empirical results show competitive FID scores and strong 3D consistency, supported by ablations demonstrating the stability benefits of multi-view regularization and random background augmentation. This approach facilitates exporting 3D heads to explicit environments, with potential extensions to back-of-head synthesis and morphable models for animation.

Abstract

Recently, 3D GANs based on 3D Gaussian splatting have been proposed for high quality synthesis of human heads. However, existing methods stabilize training and enhance rendering quality from steep viewpoints by conditioning the random latent vector on the current camera position. This compromises 3D consistency, as we observe significant identity changes when re-synthesizing the 3D head with each camera shift. Conversely, fixing the camera to a single viewpoint yields high-quality renderings for that perspective but results in poor performance for novel views. Removing view-conditioning typically destabilizes GAN training, often causing the training to collapse. In response to these challenges, we introduce CGS-GAN, a novel 3D Gaussian Splatting GAN framework that enables stable training and high-quality 3D-consistent synthesis of human heads without relying on view-conditioning. To ensure training stability, we introduce a multi-view regularization technique that enhances generator convergence with minimal computational overhead. Additionally, we adapt the conditional loss used in existing 3D Gaussian splatting GANs and propose a generator architecture designed to not only stabilize training but also facilitate efficient rendering and straightforward scaling, enabling output resolutions up to $2048^2$. To evaluate the capabilities of CGS-GAN, we curate a new dataset derived from FFHQ. This dataset enables very high resolutions, focuses on larger portions of the human head, reduces view-dependent artifacts for improved 3D consistency, and excludes images where subjects are obscured by hands or other objects. As a result, our approach achieves very high rendering quality, supported by competitive FID scores, while ensuring consistent 3D scene generation. Check our our project page here: https://fraunhoferhhi.github.io/cgs-gan/

CGS-GAN: 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis

TL;DR

CGS-GAN tackles the challenge of 3D-consistent, high-resolution human head synthesis without view-conditioning. It combines a memory-efficient Gaussian-splatting generator with a lightweight multi-view regularization and a discriminator-based camera conditioning strategy, enabling consistent rendering from arbitrary viewpoints up to . The authors also curate FFHQ-Clean, a high-quality FFHQ-derived dataset designed to reduce view-dependent artifacts and occlusions. Empirical results show competitive FID scores and strong 3D consistency, supported by ablations demonstrating the stability benefits of multi-view regularization and random background augmentation. This approach facilitates exporting 3D heads to explicit environments, with potential extensions to back-of-head synthesis and morphable models for animation.

Abstract

Recently, 3D GANs based on 3D Gaussian splatting have been proposed for high quality synthesis of human heads. However, existing methods stabilize training and enhance rendering quality from steep viewpoints by conditioning the random latent vector on the current camera position. This compromises 3D consistency, as we observe significant identity changes when re-synthesizing the 3D head with each camera shift. Conversely, fixing the camera to a single viewpoint yields high-quality renderings for that perspective but results in poor performance for novel views. Removing view-conditioning typically destabilizes GAN training, often causing the training to collapse. In response to these challenges, we introduce CGS-GAN, a novel 3D Gaussian Splatting GAN framework that enables stable training and high-quality 3D-consistent synthesis of human heads without relying on view-conditioning. To ensure training stability, we introduce a multi-view regularization technique that enhances generator convergence with minimal computational overhead. Additionally, we adapt the conditional loss used in existing 3D Gaussian splatting GANs and propose a generator architecture designed to not only stabilize training but also facilitate efficient rendering and straightforward scaling, enabling output resolutions up to . To evaluate the capabilities of CGS-GAN, we curate a new dataset derived from FFHQ. This dataset enables very high resolutions, focuses on larger portions of the human head, reduces view-dependent artifacts for improved 3D consistency, and excludes images where subjects are obscured by hands or other objects. As a result, our approach achieves very high rendering quality, supported by competitive FID scores, while ensuring consistent 3D scene generation. Check our our project page here: https://fraunhoferhhi.github.io/cgs-gan/

Paper Structure

This paper contains 17 sections, 1 equation, 17 figures, 3 tables.

Figures (17)

  • Figure 1: Example renderings at $1024^2$ resolution produced by our 3D Consistent Gaussian splatting GAN (CGS-GAN). Unlike prior methods that recompute the scene for each individual view to ensure high quality, our method is capable of preserving quality while synthesizing a fully 3D consistent scene that can be ported into explicit 3D settings like game engines or VR environments.
  • Figure 2: Overview of the proposed CGS-GAN framework. The generator is built on the generator of GSGAN hyun2024gsgan with key modifications for improved 3D consistency and scalability. In contrast to view-conditioned approaches, our method omits camera labels in the mapping network and instead stabilizes training through efficient multi-view rendering of the 3DGS head during each training step. Additionally, we render with random backgrounds to reduce hole artifacts.
  • Figure 3: Visualizations how GGHead (left) and GSGAN (right) rely on view-conditioning to achieve good quality renderings from any angle. As soon as the view-conditioning does not align with the camera pose, we observe worse quality. Additionally, we use a fixed latent vector for all four images, underlining that view-conditioning also changes the identity and expression.
  • Figure 4: Our data processing pipeline: 1. remove images with occluders to enhance training, 2. recrop the entire head at $2048^2$ resolution, 3. apply background masking, 4. rebalance the smiling bias, and finally 5. rebalance the camera positions.
  • Figure 5: A comparison between GGHead, GSGAN and our proposed method. For prior methods we apply conditioning on the current view (first rows), resulting in good but inconsistent identities, and apply conditioning on a left side view (bottom rows), resulting in a consistent scene with poor quality for novel views. In contrast, our method creates high quality renderings for the whole rotation for FFHQ and FFHQC (last row).
  • ...and 12 more figures