Table of Contents
Fetching ...

UP-FacE: User-predictable Fine-grained Face Shape Editing

Florian Strohm, Mihai Bâce, Andreas Bulling

TL;DR

UP-FacE addresses the challenge of predictable, fine-grained face editing by grounding edits in 23 landmark-derived semantic features and deploying a transformer-based latent-editing module within the StyleGAN2 framework. It introduces a scaling mechanism and a semantic face feature loss to deterministically steer a target facial feature value while preserving other features, all without manual attribute labels. Quantitative and qualitative results demonstrate precise, localized edits across 23 features and favorable FID/identity metrics compared to baselines, alongside a slider-like interface that supports practical user-driven editing. This work enables user-predictable face manipulation with robust disentanglement, laying groundwork for interactive, geometrically coherent facial editing tools.

Abstract

We present User-predictable Face Editing (UP-FacE) -- a novel method for predictable face shape editing. In stark contrast to existing methods for face editing using trial and error, edits with UP-FacE are predictable by the human user. That is, users can control the desired degree of change precisely and deterministically and know upfront the amount of change required to achieve a certain editing result. Our method leverages facial landmarks to precisely measure facial feature values, facilitating the training of UP-FacE without manually annotated attribute labels. At the core of UP-FacE is a transformer-based network that takes as input a latent vector from a pre-trained generative model and a facial feature embedding, and predicts a suitable manipulation vector. To enable user-predictable editing, a scaling layer adjusts the manipulation vector to achieve the precise desired degree of change. To ensure that the desired feature is manipulated towards the target value without altering uncorrelated features, we further introduce a novel semantic face feature loss. Qualitative and quantitative results demonstrate that UP-FacE enables precise and fine-grained control over 23 face shape features.

UP-FacE: User-predictable Fine-grained Face Shape Editing

TL;DR

UP-FacE addresses the challenge of predictable, fine-grained face editing by grounding edits in 23 landmark-derived semantic features and deploying a transformer-based latent-editing module within the StyleGAN2 framework. It introduces a scaling mechanism and a semantic face feature loss to deterministically steer a target facial feature value while preserving other features, all without manual attribute labels. Quantitative and qualitative results demonstrate precise, localized edits across 23 features and favorable FID/identity metrics compared to baselines, alongside a slider-like interface that supports practical user-driven editing. This work enables user-predictable face manipulation with robust disentanglement, laying groundwork for interactive, geometrically coherent facial editing tools.

Abstract

We present User-predictable Face Editing (UP-FacE) -- a novel method for predictable face shape editing. In stark contrast to existing methods for face editing using trial and error, edits with UP-FacE are predictable by the human user. That is, users can control the desired degree of change precisely and deterministically and know upfront the amount of change required to achieve a certain editing result. Our method leverages facial landmarks to precisely measure facial feature values, facilitating the training of UP-FacE without manually annotated attribute labels. At the core of UP-FacE is a transformer-based network that takes as input a latent vector from a pre-trained generative model and a facial feature embedding, and predicts a suitable manipulation vector. To enable user-predictable editing, a scaling layer adjusts the manipulation vector to achieve the precise desired degree of change. To ensure that the desired feature is manipulated towards the target value without altering uncorrelated features, we further introduce a novel semantic face feature loss. Qualitative and quantitative results demonstrate that UP-FacE enables precise and fine-grained control over 23 face shape features.
Paper Structure (12 sections, 4 equations, 5 figures, 3 tables)

This paper contains 12 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: UP-FacE allows for fine-grained control over various face shape features, such as eye width, eye and mouth openness, or eyebrow thickness. Qualitative and quantitative results demonstrate that UP-FacE enables precise and fine-grained control over 23 face shape features. In stark contrast to existing methods that require trial and error editing of face features, edits with UP-FacE are predictable by the human user. That is, users can control the desired degree of change precisely and deterministically and know upfront the amount of change required to achieve a certain editing result. In addition, our method enables both isolated progressive (i.e. of the same feature) and sequential (i.e. multiple different features) edits without altering other (unrelated) facial features.
  • Figure 2: Overview of our method. Shown in grey is the StyleGAN2 architecture. We inject a transformer encoder network between the StyleGAN mapping and synthesis network (green) that can modify the latent vector $w^+$ based on the desired face feature embedding $e_j$ by adding a semantic manipulation vector $s_e$. This manipulation vector is scaled by $k$, a scalar predicted by the scaling network based on the current and target face feature values $m_j$ and $m_j^t$. The landmark detector and calculated face features (blue) are only required during the training of the components highlighted in green.
  • Figure 3: Example progressive edits performed with UP-FacE. UP-FacE allows to easily and deterministically perform high-quality progressive edits along many different semantic dimensions with explicit control over the desired target feature values. For each demonstration of the progressive edits, we also show the difference between the first and last image, highlighting which parts of the image changed.
  • Figure 4: Sample face editing results on a real face image of UP-FacE in comparison with the two state-of-the-art methods GANSpace harkonen2020ganspace and InterFaceGAN shen2020interfacegan. The original image was inverted into latent space using the e4e framework tov2021designing, and subsequently, the degree of smiling was edited. Also shown are the difference images between the edited and original images. UP-FacE is the only method that allows for fine-grained and deterministic control of the degree of smiling without distortions.
  • Figure 5: Correlation matrix showing the Pearson correlation between two semantic face features in each cell.