Table of Contents
Fetching ...

PIE: Portrait Image Embedding for Semantic Control

Ayush Tewari, Mohamed Elgharib, Mallikarjun B R., Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, Christian Theobalt

TL;DR

PIE addresses the challenge of semantically editing real portrait photos by embedding them into the StyleGAN latent space. It combines a hierarchical non-linear optimization with a StyleRig-based mapping from a 3D Morphable Model to latent space, augmented by an identity-preservation term and multiple losses to ensure high-fidelity, editable representations. The key contributions are the first real-image embedding enabling photo-realistic, disentangled edits of head pose, expression, and illumination, an explicit identity-consistency term, and a hierarchical optimization strategy that yields interactive editing speeds with robust ablations and comparisons to state-of-the-art. This approach advances controllable, semantically meaningful portrait editing with potential impact on photography, entertainment, and AR workflows, while acknowledging limitations such as artifacts under large edits and dataset biases.

Abstract

Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study.

PIE: Portrait Image Embedding for Semantic Control

TL;DR

PIE addresses the challenge of semantically editing real portrait photos by embedding them into the StyleGAN latent space. It combines a hierarchical non-linear optimization with a StyleRig-based mapping from a 3D Morphable Model to latent space, augmented by an identity-preservation term and multiple losses to ensure high-fidelity, editable representations. The key contributions are the first real-image embedding enabling photo-realistic, disentangled edits of head pose, expression, and illumination, an explicit identity-consistency term, and a hierarchical optimization strategy that yields interactive editing speeds with robust ablations and comparisons to state-of-the-art. This approach advances controllable, semantically meaningful portrait editing with potential impact on photography, entertainment, and AR workflows, while acknowledging limitations such as artifacts under large edits and dataset biases.

Abstract

Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study.

Paper Structure

This paper contains 43 sections, 13 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Given a portrait input image, we optimize for a StyleGAN embedding which allows to faithfully reproduce the image (synthesis and facial recognition terms), editing the image based on semantic parameters such as head pose, expressions and scene illumination (edit and invariance terms), as well as preserving the facial identity during editing (facial recognition term). A novel hierarchical non-linear optimization strategy is used to compute the result. StyleGAN generated images (image with edit parameters) are used to extract the edit parameters during optimization. At "test time", i.e. for performing portrait image editing, the image with edit parameters is not needed. Note that the identity term is not visualized here. Images from Shih14.
  • Figure 2: Pose Editing. Our approach can handle a large variety of head pose modifications including out-of-plane rotations in a realistic manner. Image2StyleGAN Abdal_2019_ICCV embeddings often lead to artifacts when edited using StyleRig. Images from shen2016deep.
  • Figure 3: Illumination Editing. Our approach can realistically relight portrait images. Each edited image corresponds to changing a different Spherical Harmonics coefficient, while all other coefficients are kept fixed. The environment maps are visualized in the inset. Image2StyleGAN Abdal_2019_ICCV embeddings often lead to artifacts when edited using StyleRig. Images from shen2016deep.
  • Figure 4: Expression Editing. Our approach can also be used to edit the facial expressions in a portrait image in a realistic manner. We obtain more plausible results, compared to Image2StyleGAN Abdal_2019_ICCV embeddings. Images from shen2016deep and Shih14.
  • Figure 5: Ablative analysis of the different loss functions. Modification refers to the edit, invariance and identity terms simultaneously. The left block shows results for editing the head pose and the right block shows results for editing scene illumination. All losses are required to obtain high-fidelity edits. Images from shen2016deep.
  • ...and 7 more figures