GiMeFive: Towards Interpretable Facial Emotion Classification
Jiawen Wang, Leah Kawka
TL;DR
GiMeFive tackles facial emotion recognition with a focus on interpretability. It aggregates five FER datasets into a single training regime and employs Grad-CAM alongside 68-face landmarks to explain decisions, while maintaining competitive accuracy on RAF-DB and FER2013 with a compact architecture. The work demonstrates that a carefully regularized, moderately deep CNN can achieve strong performance and transparent reasoning, evidenced by Grad-CAM heatmaps and landmark overlays on both images and video frames. The public release of code and demonstrations underlines the practical impact for real-time FER applications requiring explainable decisions.
Abstract
Deep convolutional neural networks have been shown to successfully recognize facial emotions for the past years in the realm of computer vision. However, the existing detection approaches are not always reliable or explainable, we here propose our model GiMeFive with interpretations, i.e., via layer activations and gradient-weighted class activation mapping. We compare against the state-of-the-art methods to classify the six facial emotions. Empirical results show that our model outperforms the previous methods in terms of accuracy on two Facial Emotion Recognition (FER) benchmarks and our aggregated FER GiMeFive. Furthermore, we explain our work in real-world image and video examples, as well as real-time live camera streams. Our code and supplementary material are available at https: //github.com/werywjw/SEP-CVDL.
