eMotion-GAN: A Motion-based GAN for Photorealistic and Facial Expression Preserving Frontal View Synthesis
Omar Ikne, Benjamin Allaert, Ioan Marius Bilasco, Hazem Wannous
TL;DR
The paper tackles FER performance degradation due to head pose variations by introducing eMotion-GAN, a motion-domain frontal view synthesis framework. It decomposes the task into two stages: frontalizing facial motion with $G_{F}$ (guided by $D_{P}$ and $D_{E}$) and warping this motion into an expressive frontal face via $G_{W}$ (monitored by $e_{FER}$). Key contributions include a novel flow-based frontalization approach, a dual-discriminator setup to preserve motion and expressions, and an end-to-end training scheme augmented by EndPoint and Charbonnier losses. Empirical results on SNaP-2DFe and auxiliary FER datasets show reduced FER gaps between frontal and non-frontal faces, with up to +$5$% to +$20$% FER improvements in motion, and demonstrate cross-subject motion transfer capabilities and meaningful ablations. The work advances robust FER under real-world pose variations and offers potential data augmentation and synthesis utilities, while acknowledging misuse risks in deepfake contexts.
Abstract
Many existing facial expression recognition (FER) systems encounter substantial performance degradation when faced with variations in head pose. Numerous frontalization methods have been proposed to enhance these systems' performance under such conditions. However, they often introduce undesirable deformations, rendering them less suitable for precise facial expression analysis. In this paper, we present eMotion-GAN, a novel deep learning approach designed for frontal view synthesis while preserving facial expressions within the motion domain. Considering the motion induced by head variation as noise and the motion induced by facial expression as the relevant information, our model is trained to filter out the noisy motion in order to retain only the motion related to facial expression. The filtered motion is then mapped onto a neutral frontal face to generate the corresponding expressive frontal face. We conducted extensive evaluations using several widely recognized dynamic FER datasets, which encompass sequences exhibiting various degrees of head pose variations in both intensity and orientation. Our results demonstrate the effectiveness of our approach in significantly reducing the FER performance gap between frontal and non-frontal faces. Specifically, we achieved a FER improvement of up to +5\% for small pose variations and up to +20\% improvement for larger pose variations. Code available at \url{https://github.com/o-ikne/eMotion-GAN.git}.
