Let's Get the FACS Straight -- Reconstructing Obstructed Facial Features
Tim Büchner, Sven Sickert, Gerd Fabian Volk, Christoph Anders, Orlando Guntinas-Lichius, Joachim Denzler
TL;DR
The paper tackles obstructed facial analysis by removing sEMG sensor obstructions from video frames using unpaired CycleGAN-style translation, avoiding repeated fine-tuning for each task. By treating sensor presence as a style shift, the authors reconstruct clean facial features (via $G_{S \mapsto N}$) while preserving identity and expression, enabling downstream AU and emotion analyses. Quantitative perceptual metrics (LPIPS, FID) and downstream tasks (AU with RDF/JAA-NET, emotion detection with ResMaskNet) show restoration quality approaching, and sometimes exceeding, the baseline unobstructed videos. This approach facilitates applying existing facial analysis methods to obstructed data, with subject-specific models offering robustness across individuals and recording conditions.
Abstract
The human face is one of the most crucial parts in interhuman communication. Even when parts of the face are hidden or obstructed the underlying facial movements can be understood. Machine learning approaches often fail in that regard due to the complexity of the facial structures. To alleviate this problem a common approach is to fine-tune a model for such a specific application. However, this is computational intensive and might have to be repeated for each desired analysis task. In this paper, we propose to reconstruct obstructed facial parts to avoid the task of repeated fine-tuning. As a result, existing facial analysis methods can be used without further changes with respect to the data. In our approach, the restoration of facial features is interpreted as a style transfer task between different recording setups. By using the CycleGAN architecture the requirement of matched pairs, which is often hard to fullfill, can be eliminated. To proof the viability of our approach, we compare our reconstructions with real unobstructed recordings. We created a novel data set in which 36 test subjects were recorded both with and without 62 surface electromyography sensors attached to their faces. In our evaluation, we feature typical facial analysis tasks, like the computation of Facial Action Units and the detection of emotions. To further assess the quality of the restoration, we also compare perceptional distances. We can show, that scores similar to the videos without obstructing sensors can be achieved.
