Table of Contents
Fetching ...

Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation

Jichao Zhang, Aliaksandr Siarohin, Yahui Liu, Hao Tang, Nicu Sebe, Wei Wang

TL;DR

This paper proposes a novel approach: a conditional GNeRF model that integrates specific attribute labels as input, thus amplifying the controllability and disentanglement capabilities of 3D-aware generative models.

Abstract

Generative Neural Radiance Fields (GNeRF)-based 3D-aware GANs have showcased remarkable prowess in crafting high-fidelity images while upholding robust 3D consistency, particularly face generation. However, specific existing models prioritize view consistency over disentanglement, leading to constrained semantic or attribute control during the generation process. While many methods have explored incorporating semantic masks or leveraging 3D Morphable Models (3DMM) priors to imbue models with semantic control, these methods often demand training from scratch, entailing significant computational overhead. In this paper, we propose a novel approach: a conditional GNeRF model that integrates specific attribute labels as input, thus amplifying the controllability and disentanglement capabilities of 3D-aware generative models. Our approach builds upon a pre-trained 3D-aware face model, and we introduce a Training as Init and Optimizing for Tuning (TRIOT) method to train a conditional normalized flow module to enable the facial attribute editing, then optimize the latent vector to improve attribute-editing precision further. Our extensive experiments substantiate the efficacy of our model, showcasing its ability to generate high-quality edits with enhanced view consistency while safeguarding non-target regions. The code for our model is publicly available at https://github.com/zhangqianhui/TT-GNeRF.

Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation

TL;DR

This paper proposes a novel approach: a conditional GNeRF model that integrates specific attribute labels as input, thus amplifying the controllability and disentanglement capabilities of 3D-aware generative models.

Abstract

Generative Neural Radiance Fields (GNeRF)-based 3D-aware GANs have showcased remarkable prowess in crafting high-fidelity images while upholding robust 3D consistency, particularly face generation. However, specific existing models prioritize view consistency over disentanglement, leading to constrained semantic or attribute control during the generation process. While many methods have explored incorporating semantic masks or leveraging 3D Morphable Models (3DMM) priors to imbue models with semantic control, these methods often demand training from scratch, entailing significant computational overhead. In this paper, we propose a novel approach: a conditional GNeRF model that integrates specific attribute labels as input, thus amplifying the controllability and disentanglement capabilities of 3D-aware generative models. Our approach builds upon a pre-trained 3D-aware face model, and we introduce a Training as Init and Optimizing for Tuning (TRIOT) method to train a conditional normalized flow module to enable the facial attribute editing, then optimize the latent vector to improve attribute-editing precision further. Our extensive experiments substantiate the efficacy of our model, showcasing its ability to generate high-quality edits with enhanced view consistency while safeguarding non-target regions. The code for our model is publicly available at https://github.com/zhangqianhui/TT-GNeRF.
Paper Structure (14 sections, 8 equations, 16 figures, 3 tables)

This paper contains 14 sections, 8 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Our method produces controllable 3D-aware face generation (first two rows) given specific attributes as guidance and the corresponding normals (bottom two rows). As shown in the normal images, the geometry has been preserved for the attribute "Hair Color", while the mouth region of the "Smiling" mesh has changed.
  • Figure 2: "Training as Init, Optimizing for Tuning" method overview (Attribute "Expression" as an example). First, given the pretrained StyleSDF orel2021stylesdf, we follow StyleFlow to train the continuous normalized flows (CNF) to learn the conditional distribution of latent code $w$ (Left). The attribute editing can be enabled by given target labels $A_{e}$ to manipulate the latent code. Second, we regard the edited latent code $w_{e}$ as the initial result, then iteratively optimize the latent code $w_{e}$ to search for a better one by using proposed mask-based geometry and texture loss (Right).
  • Figure 3: Semantic decomposition involves applying clusters to the pretrained EG3D model. Here, $k$ represents the number of clusters utilized. The second and third columns display outcomes achieved through k-means clustering for rendering features $\pmb{S_{B}}$ from neural rendering. The final column exhibits rendering semantic maps generated via k-means clustering for 3D volumes $\pmb{S_{A}}$.
  • Figure 4: Corresponding mask $M$ for all attributes.
  • Figure 5: Reference-based geometry transfer pipeline. It minimizes the difference between normals (geometry loss) and the differences between texture images (texture loss) in perceptual space to search for a better $\pmb{\hat{w}_{e}}$.
  • ...and 11 more figures