Table of Contents
Fetching ...

GiMeFive: Towards Interpretable Facial Emotion Classification

Jiawen Wang, Leah Kawka

TL;DR

GiMeFive tackles facial emotion recognition with a focus on interpretability. It aggregates five FER datasets into a single training regime and employs Grad-CAM alongside 68-face landmarks to explain decisions, while maintaining competitive accuracy on RAF-DB and FER2013 with a compact architecture. The work demonstrates that a carefully regularized, moderately deep CNN can achieve strong performance and transparent reasoning, evidenced by Grad-CAM heatmaps and landmark overlays on both images and video frames. The public release of code and demonstrations underlines the practical impact for real-time FER applications requiring explainable decisions.

Abstract

Deep convolutional neural networks have been shown to successfully recognize facial emotions for the past years in the realm of computer vision. However, the existing detection approaches are not always reliable or explainable, we here propose our model GiMeFive with interpretations, i.e., via layer activations and gradient-weighted class activation mapping. We compare against the state-of-the-art methods to classify the six facial emotions. Empirical results show that our model outperforms the previous methods in terms of accuracy on two Facial Emotion Recognition (FER) benchmarks and our aggregated FER GiMeFive. Furthermore, we explain our work in real-world image and video examples, as well as real-time live camera streams. Our code and supplementary material are available at https: //github.com/werywjw/SEP-CVDL.

GiMeFive: Towards Interpretable Facial Emotion Classification

TL;DR

GiMeFive tackles facial emotion recognition with a focus on interpretability. It aggregates five FER datasets into a single training regime and employs Grad-CAM alongside 68-face landmarks to explain decisions, while maintaining competitive accuracy on RAF-DB and FER2013 with a compact architecture. The work demonstrates that a carefully regularized, moderately deep CNN can achieve strong performance and transparent reasoning, evidenced by Grad-CAM heatmaps and landmark overlays on both images and video frames. The public release of code and demonstrations underlines the practical impact for real-time FER applications requiring explainable decisions.

Abstract

Deep convolutional neural networks have been shown to successfully recognize facial emotions for the past years in the realm of computer vision. However, the existing detection approaches are not always reliable or explainable, we here propose our model GiMeFive with interpretations, i.e., via layer activations and gradient-weighted class activation mapping. We compare against the state-of-the-art methods to classify the six facial emotions. Empirical results show that our model outperforms the previous methods in terms of accuracy on two Facial Emotion Recognition (FER) benchmarks and our aggregated FER GiMeFive. Furthermore, we explain our work in real-world image and video examples, as well as real-time live camera streams. Our code and supplementary material are available at https: //github.com/werywjw/SEP-CVDL.
Paper Structure (25 sections, 2 equations, 8 figures, 3 tables)

This paper contains 25 sections, 2 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Test accuracies (%) of our GiMeFive compared to other state-of-the-art models on the RAF-DB dataset (see \ref{['tab:model']} for full results).
  • Figure 2: Overview of the experimental pipeline.
  • Figure 3: Overview of the GiMeFive model architecture (see \ref{['fig:modeldetail']} for a detailed version).
  • Figure 4: Overview of the confusion matrix evaluated from our GiMeFive on the validation set.
  • Figure 5: Interpreting images with all six facial emotional classes (left: original image; middle: heatmap; right: Grad-CAM) evaluated from our GiMeFive on the validation set.
  • ...and 3 more figures