Table of Contents
Fetching ...

SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation

Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, Xiaoguang Han

TL;DR

SphereHead tackles the challenge of 3D-aware full-head synthesis from all viewpoints by introducing a dual spherical tri-plane representation that aligns with head geometry and reduces mirroring artifacts. It couples this with a view-image consistency loss (ViCo) that forces the discriminator to consider camera pose congruence, preventing high-quality but misoriented back-view generations. The framework also incorporates a parsing branch for semantic embedding and delivers a new WildHead 60k-head dataset to support community research. Empirical results show superior performance over state-of-the-art 3D-aware GANs in both qualitative and quantitative metrics, along with user studies confirming reduced artifacts and improved realism. The work provides open-source tools for data processing and a public dataset to foster broader research in 360-degree full-head synthesis.

Abstract

While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often causes artifacts for back views. Based on our in-depth analysis, we found the reasons are mainly twofold. First, from network architecture perspective, we found each plane in the utilized tri-plane/tri-grid representation space tends to confuse the features from both sides, causing "mirroring" artifacts (e.g., the glasses appear in the back). Second, from data supervision aspect, we found that existing discriminator training in 3D GANs mainly focuses on the quality of the rendered image itself, and does not care much about its plausibility with the perspective from which it was rendered. This makes it possible to generate "face" in non-frontal views, due to its easiness to fool the discriminator. In response, we propose SphereHead, a novel tri-plane representation in the spherical coordinate system that fits the human head's geometric characteristics and efficiently mitigates many of the generated artifacts. We further introduce a view-image consistency loss for the discriminator to emphasize the correspondence of the camera parameters and the images. The combination of these efforts results in visually superior outcomes with significantly fewer artifacts. Our code and dataset are publicly available at https://lhyfst.github.io/spherehead.

SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation

TL;DR

SphereHead tackles the challenge of 3D-aware full-head synthesis from all viewpoints by introducing a dual spherical tri-plane representation that aligns with head geometry and reduces mirroring artifacts. It couples this with a view-image consistency loss (ViCo) that forces the discriminator to consider camera pose congruence, preventing high-quality but misoriented back-view generations. The framework also incorporates a parsing branch for semantic embedding and delivers a new WildHead 60k-head dataset to support community research. Empirical results show superior performance over state-of-the-art 3D-aware GANs in both qualitative and quantitative metrics, along with user studies confirming reduced artifacts and improved realism. The work provides open-source tools for data processing and a public dataset to foster broader research in 360-degree full-head synthesis.

Abstract

While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often causes artifacts for back views. Based on our in-depth analysis, we found the reasons are mainly twofold. First, from network architecture perspective, we found each plane in the utilized tri-plane/tri-grid representation space tends to confuse the features from both sides, causing "mirroring" artifacts (e.g., the glasses appear in the back). Second, from data supervision aspect, we found that existing discriminator training in 3D GANs mainly focuses on the quality of the rendered image itself, and does not care much about its plausibility with the perspective from which it was rendered. This makes it possible to generate "face" in non-frontal views, due to its easiness to fool the discriminator. In response, we propose SphereHead, a novel tri-plane representation in the spherical coordinate system that fits the human head's geometric characteristics and efficiently mitigates many of the generated artifacts. We further introduce a view-image consistency loss for the discriminator to emphasize the correspondence of the camera parameters and the images. The combination of these efforts results in visually superior outcomes with significantly fewer artifacts. Our code and dataset are publicly available at https://lhyfst.github.io/spherehead.
Paper Structure (31 sections, 10 equations, 13 figures, 2 tables)

This paper contains 31 sections, 10 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Given input images, applying PTIroich2022pivotal methods to reconstruct full-head synthesis based on PanoHeadan2023panohead often yields noticeable artifacts (green box), while those using our SphereHead space (blue box) appear more realistic.
  • Figure 2: Two types of fake face artifacts in PanoHead. (a-b) We name the first type as mirroring-face artifacts, due to the back face mirroring the identity, expression and accessories of the front face precisely. (c-d) We name the second type as multiple-face artifacts, because in this scenario there might be more than one fake faces and their identities, expression and accessories are different from the front face.
  • Figure 3: The framework of our proposed SphereHead. Given a sampled code $z$ and camera parameter $c$, SphereHead synthesizes a spherical tri-plane features $f_F$ by fusing two sub-feature groups $f_A$ and $f_B$. By volumetric rendering with the sampled features in $f_F$, SphereHead generates high-quality view-consistent full head images $I^{+}$. The discriminator learns to focus on the alignment between images and their viewpoints instructed by our view-image consistency loss, by introducing an additional negative data pairs consisting the real images and mismatched labels $c_s$.
  • Figure 4: (a) Tri-plane representation. (b) Spherical tri-plane representation. Reconstructed head geometry from single (c) sphere A and (d) sphere B, each showing (i) seam artifacts and (ii) polar artifacts. (e) The combination of two spheres in dual spherical tri-plane representation, (i) the seam of sphere A, (ii) the seam of sphere B. (f) Fusion weight map. (g-h) For each sphere, the weight approaches zero as the locations near the seam and poles.
  • Figure 5: Qualitative comparison with state-of-the-art methods. (a) GIRAFFHD xue2022giraffe, (b) StyleSDF or2022stylesdf, (c) EG3D chan2022efficient fail to capture the complete head geometry and appearance. (d-f) PanoHead an2023panohead shows complete head generation, but the results suffer from mirroring artifacts ((d) left-right identical mirroring artifacts and (f) mirroring-face artifacts) and (e) multiple-face artifacts). (g-l) Ours SphereHead synthesizes full-head images of high visual quality and is free of artifacts exhibited by other methods.
  • ...and 8 more figures