Table of Contents
Fetching ...

AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement

Shiwei Jin, Zhen Wang, Lei Wang, Peng Liu, Ning Bi, Truong Nguyen

TL;DR

This paper tackles fine-grained AU intensity manipulation under limited subject data by introducing AUEditNet, a dual-branch latent-space editing framework that disentangles target AUs from identity and other attributes. It operates in the $W^+$ latent space of StyleGAN2, using per-level editing modules and a level-wise label space via $\\Psi_{enc}$ and $\\Psi_{dec}$ to map target AU labels into editing directions, enabling conditioning by AU intensities or target images without retraining. Trained on the DISFA dataset with 18 subjects and evaluated across DISFA, BU-4DFE, CelebA-HQ, and FFHQ, AUEditNet achieves state-of-the-art AU editing accuracy, strong identity preservation, and robust cross-domain performance, while also enabling expression transfer. The approach reduces reliance on large annotated datasets and pretrained AU estimators, offering a practical solution for high-quality facial attribute editing in data-scarce regimes and opening avenues for data augmentation and controllable expression synthesis.

Abstract

Facial action unit (AU) intensity plays a pivotal role in quantifying fine-grained expression behaviors, which is an effective condition for facial expression manipulation. However, publicly available datasets containing intensity annotations for multiple AUs remain severely limited, often featuring a restricted number of subjects. This limitation places challenges to the AU intensity manipulation in images due to disentanglement issues, leading researchers to resort to other large datasets with pretrained AU intensity estimators for pseudo labels. In addressing this constraint and fully leveraging manual annotations of AU intensities for precise manipulation, we introduce AUEditNet. Our proposed model achieves impressive intensity manipulation across 12 AUs, trained effectively with only 18 subjects. Utilizing a dual-branch architecture, our approach achieves comprehensive disentanglement of facial attributes and identity without necessitating additional loss functions or implementing with large batch sizes. This approach offers a potential solution to achieve desired facial attribute editing despite the dataset's limited subject count. Our experiments demonstrate AUEditNet's superior accuracy in editing AU intensities, affirming its capability in disentangling facial attributes and identity within a limited subject pool. AUEditNet allows conditioning by either intensity values or target images, eliminating the need for constructing AU combinations for specific facial expression synthesis. Moreover, AU intensity estimation, as a downstream task, validates the consistency between real and edited images, confirming the effectiveness of our proposed AU intensity manipulation method.

AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement

TL;DR

This paper tackles fine-grained AU intensity manipulation under limited subject data by introducing AUEditNet, a dual-branch latent-space editing framework that disentangles target AUs from identity and other attributes. It operates in the latent space of StyleGAN2, using per-level editing modules and a level-wise label space via and to map target AU labels into editing directions, enabling conditioning by AU intensities or target images without retraining. Trained on the DISFA dataset with 18 subjects and evaluated across DISFA, BU-4DFE, CelebA-HQ, and FFHQ, AUEditNet achieves state-of-the-art AU editing accuracy, strong identity preservation, and robust cross-domain performance, while also enabling expression transfer. The approach reduces reliance on large annotated datasets and pretrained AU estimators, offering a practical solution for high-quality facial attribute editing in data-scarce regimes and opening avenues for data augmentation and controllable expression synthesis.

Abstract

Facial action unit (AU) intensity plays a pivotal role in quantifying fine-grained expression behaviors, which is an effective condition for facial expression manipulation. However, publicly available datasets containing intensity annotations for multiple AUs remain severely limited, often featuring a restricted number of subjects. This limitation places challenges to the AU intensity manipulation in images due to disentanglement issues, leading researchers to resort to other large datasets with pretrained AU intensity estimators for pseudo labels. In addressing this constraint and fully leveraging manual annotations of AU intensities for precise manipulation, we introduce AUEditNet. Our proposed model achieves impressive intensity manipulation across 12 AUs, trained effectively with only 18 subjects. Utilizing a dual-branch architecture, our approach achieves comprehensive disentanglement of facial attributes and identity without necessitating additional loss functions or implementing with large batch sizes. This approach offers a potential solution to achieve desired facial attribute editing despite the dataset's limited subject count. Our experiments demonstrate AUEditNet's superior accuracy in editing AU intensities, affirming its capability in disentangling facial attributes and identity within a limited subject pool. AUEditNet allows conditioning by either intensity values or target images, eliminating the need for constructing AU combinations for specific facial expression synthesis. Moreover, AU intensity estimation, as a downstream task, validates the consistency between real and edited images, confirming the effectiveness of our proposed AU intensity manipulation method.
Paper Structure (31 sections, 8 equations, 4 figures, 4 tables)

This paper contains 31 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overall scheme of the proposed AUEditNet. AUEditNet has a dual-branch structure that separately addresses source attribute removal (Source Branch) and target attribute addition (Target Branch). The Source Branch aims at removing the original status in $I_{src}$, maintaining other attributes and identity while keeping them distinct from the feature space of target facial attributes (highlighted in yellow). The Target Branch focuses on determining an edited direction $\Delta \hat{W}^j_{tar}$ for the new status of the target facial attribute, ensuring its independence from identity and other facial attributes. Instead of applying this branch directly to $I_{src}$, we randomly select another image $I_{rnd}$, facilitating implicit disentanglement of attributes and identity. The blue bold arrows present feature flows excluding the target facial attributes. In this configuration, AUEditNet guarantees that these flows remain outside the embedding space of the target facial attributes.
  • Figure 2: Comparison of AU intensity manipulation using target AU intensities in DISFA. AUEditNet, ReDirTrans generate editing results that involve the removal ($-$) of source attributes and the addition ($+$) of target attributes. DeltaEdit uses intensity differences between source and target images for attribute addition ($+\Delta$). The removal ($-$) process yields 'neutral-like' face images with all AU intensities set to zero.
  • Figure 3: Cross-dataset evaluation of single AU intensity manipulation in CelebA-HQ. The descriptions of AUs (from top to bottom) are Outer Brow Raiser, Brow Lowerer, Upper Lid Raiser, Lip Corner Depressor, and Lips Part. $a_{tar}$ represents the target intensity.
  • Figure 4: AU intensity manipulation conditioned on target images to achieve facial expression transfer on the BU-4DFE dataset. The fine-grained facial expressions, such as AU $17$ (Chin Raiser) in 'Sadness' and AU $25$ (Lips Part) in 'Disgust', are transferred accurately.