GANmut: Generating and Modifying Facial Expressions
Maria Surani
TL;DR
This work extends GANmut by benchmarking a continuous, learnable emotion space for facial expression synthesis across in-the-wild data. It uses a polar latent space $Z=(\theta,\rho)$ within a multi-domain GAN to capture both emotion category and intensity, trained with adversarial, classification, regression, interpolation, and reconstruction losses. Across Aff-Wild2 and the AffNet-augmented dataset, and using RetinaFace or MTCNN detectors, GANmut achieves improved qualitative realism and competitive FED/Smoothness metrics, underscoring the value of dataset diversity and robust face detection. The findings highlight practical implications for emotion-aware synthesis under uncontrolled imaging conditions, where richer data and detector choices better empower the learned expressive space.
Abstract
In the realm of emotion synthesis, the ability to create authentic and nuanced facial expressions continues to gain importance. The GANmut study discusses a recently introduced advanced GAN framework that, instead of relying on predefined labels, learns a dynamic and interpretable emotion space. This methodology maps each discrete emotion as vectors starting from a neutral state, their magnitude reflecting the emotion's intensity. The current project aims to extend the study of this framework by benchmarking across various datasets, image resolutions, and facial detection methodologies. This will involve conducting a series of experiments using two emotional datasets: Aff-Wild2 and AffNet. Aff-Wild2 contains videos captured in uncontrolled environments, which include diverse camera angles, head positions, and lighting conditions, providing a real-world challenge. AffNet offers images with labelled emotions, improving the diversity of emotional expressions available for training. The first two experiments will focus on training GANmut using the Aff-Wild2 dataset, processed with either RetinaFace or MTCNN, both of which are high-performance deep learning face detectors. This setup will help determine how well GANmut can learn to synthesise emotions under challenging conditions and assess the comparative effectiveness of these face detection technologies. The subsequent two experiments will merge the Aff-Wild2 and AffNet datasets, combining the real world variability of Aff-Wild2 with the diverse emotional labels of AffNet. The same face detectors, RetinaFace and MTCNN, will be employed to evaluate whether the enhanced diversity of the combined datasets improves GANmut's performance and to compare the impact of each face detection method in this hybrid setup.
