Coupling deep and handcrafted features to assess smile genuineness
Benedykt Pawlus, Bogdan Smolka, Jolanta Kawulok, Michal Kawulok
TL;DR
The paper tackles smile genuineness recognition from video by fusing handcrafted AU dynamics (AUDA) with deep features from RealSmileNet in a late-fusion framework. It introduces frame-wise AUDA streams and phase-wise AU dynamics that are combined with deep CNN-LSTM representations, forming four parallel models whose outputs are concatenated for final classification. Empirical results on the UvA-NEMO dataset show that AUDA dynamics alone can outperform the deep features, and that their fusion yields the highest accuracy while remaining capable of real-time processing on standard GPUs. The work contributes interpretable AU-based cues, demonstrates the value of combining handcrafted dynamics with deep features for emotion-related tasks, and points to future directions in dynamic-focused network designs and super-resolution preprocessing.
Abstract
Assessing smile genuineness from video sequences is a vital topic concerned with recognizing facial expression and linking them with the underlying emotional states. There have been a number of techniques proposed underpinned with handcrafted features, as well as those that rely on deep learning to elaborate the useful features. As both of these approaches have certain benefits and limitations, in this work we propose to combine the features learned by a long short-term memory network with the features handcrafted to capture the dynamics of facial action units. The results of our experiments indicate that the proposed solution is more effective than the baseline techniques and it allows for assessing the smile genuineness from video sequences in real-time.
