GANmut: Generating and Modifying Facial Expressions

Maria Surani

GANmut: Generating and Modifying Facial Expressions

Maria Surani

TL;DR

This work extends GANmut by benchmarking a continuous, learnable emotion space for facial expression synthesis across in-the-wild data. It uses a polar latent space $Z=(\theta,\rho)$ within a multi-domain GAN to capture both emotion category and intensity, trained with adversarial, classification, regression, interpolation, and reconstruction losses. Across Aff-Wild2 and the AffNet-augmented dataset, and using RetinaFace or MTCNN detectors, GANmut achieves improved qualitative realism and competitive FED/Smoothness metrics, underscoring the value of dataset diversity and robust face detection. The findings highlight practical implications for emotion-aware synthesis under uncontrolled imaging conditions, where richer data and detector choices better empower the learned expressive space.

Abstract

In the realm of emotion synthesis, the ability to create authentic and nuanced facial expressions continues to gain importance. The GANmut study discusses a recently introduced advanced GAN framework that, instead of relying on predefined labels, learns a dynamic and interpretable emotion space. This methodology maps each discrete emotion as vectors starting from a neutral state, their magnitude reflecting the emotion's intensity. The current project aims to extend the study of this framework by benchmarking across various datasets, image resolutions, and facial detection methodologies. This will involve conducting a series of experiments using two emotional datasets: Aff-Wild2 and AffNet. Aff-Wild2 contains videos captured in uncontrolled environments, which include diverse camera angles, head positions, and lighting conditions, providing a real-world challenge. AffNet offers images with labelled emotions, improving the diversity of emotional expressions available for training. The first two experiments will focus on training GANmut using the Aff-Wild2 dataset, processed with either RetinaFace or MTCNN, both of which are high-performance deep learning face detectors. This setup will help determine how well GANmut can learn to synthesise emotions under challenging conditions and assess the comparative effectiveness of these face detection technologies. The subsequent two experiments will merge the Aff-Wild2 and AffNet datasets, combining the real world variability of Aff-Wild2 with the diverse emotional labels of AffNet. The same face detectors, RetinaFace and MTCNN, will be employed to evaluate whether the enhanced diversity of the combined datasets improves GANmut's performance and to compare the impact of each face detection method in this hybrid setup.

GANmut: Generating and Modifying Facial Expressions

TL;DR

This work extends GANmut by benchmarking a continuous, learnable emotion space for facial expression synthesis across in-the-wild data. It uses a polar latent space

within a multi-domain GAN to capture both emotion category and intensity, trained with adversarial, classification, regression, interpolation, and reconstruction losses. Across Aff-Wild2 and the AffNet-augmented dataset, and using RetinaFace or MTCNN detectors, GANmut achieves improved qualitative realism and competitive FED/Smoothness metrics, underscoring the value of dataset diversity and robust face detection. The findings highlight practical implications for emotion-aware synthesis under uncontrolled imaging conditions, where richer data and detector choices better empower the learned expressive space.

Abstract

Paper Structure (19 sections, 7 equations, 10 figures, 5 tables)

This paper contains 19 sections, 7 equations, 10 figures, 5 tables.

Introduction
Related Work
Deep Learning and Generative Adversarial Networks
Facial Expressions: Label Conditioning
GANs Latent Spaces
Problem Formulation
Methodology
Linear Model
Face Detectors
RetinaFace
MTCNN
Data Sources
Training
Numerical Results
VGGNet and ResNet
...and 4 more sections

Figures (10)

Figure 1: Valence-Arousal 2D space
Figure 2: Visualization of the facial action units (AUs) used to express happiness (left) and anger (right). AU4: brow lowerer, AU5: upper lid raiser, AU6: cheek raiser, AU7: lid tightener, AU10: upper lip raiser, AU12: lip corner puller, AU25: lip part, AU26: jaw drop affnet
Figure 3: Left - right: surprised, sadly surprised, angrily surprised Du2014CompoundFE
Figure 4: Gamut of Emotions: built using song2018selective.
Figure 5: GANmut Architecture and Flow.
...and 5 more figures

GANmut: Generating and Modifying Facial Expressions

TL;DR

Abstract

GANmut: Generating and Modifying Facial Expressions

Authors

TL;DR

Abstract

Table of Contents

Figures (10)