Table of Contents
Fetching ...

Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models

Israel A. Laurensi, Alceu de Souza Britto, Jean Paul Barddal, Alessandro Lameiras Koerich

TL;DR

The paper tackles catastrophic forgetting in facial expression recognition by introducing emotion-centered generative replay (ECgr) that uses class-specific WGAN-GP generated images alongside a QA filter to retain past knowledge while learning new emotions. It couples ECgr with a weighted loss to account for the confidence of the CNN and demonstrates the approach across FER datasets, showing ECgr—especially when paired with QA—approaches joint training performance without accessing all past data. The contributions include a novel pseudo-rehearsal framework for FER, a QA-based filtering mechanism, and a structured offline/continual-learning pipeline with a loss formulation that incorporates image quality. The findings suggest this strategy effectively mitigates forgetting, offering a practical path for continual FER in dynamic settings, albeit with higher computational cost and sensitivity to weighting choices.

Abstract

Facial expression recognition is a pivotal component in machine learning, facilitating various applications. However, convolutional neural networks (CNNs) are often plagued by catastrophic forgetting, impeding their adaptability. The proposed method, emotion-centered generative replay (ECgr), tackles this challenge by integrating synthetic images from generative adversarial networks. Moreover, ECgr incorporates a quality assurance algorithm to ensure the fidelity of generated images. This dual approach enables CNNs to retain past knowledge while learning new tasks, enhancing their performance in emotion recognition. The experimental results on four diverse facial expression datasets demonstrate that incorporating images generated by our pseudo-rehearsal method enhances training on the targeted dataset and the source dataset while making the CNN retain previously learned knowledge.

Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models

TL;DR

The paper tackles catastrophic forgetting in facial expression recognition by introducing emotion-centered generative replay (ECgr) that uses class-specific WGAN-GP generated images alongside a QA filter to retain past knowledge while learning new emotions. It couples ECgr with a weighted loss to account for the confidence of the CNN and demonstrates the approach across FER datasets, showing ECgr—especially when paired with QA—approaches joint training performance without accessing all past data. The contributions include a novel pseudo-rehearsal framework for FER, a QA-based filtering mechanism, and a structured offline/continual-learning pipeline with a loss formulation that incorporates image quality. The findings suggest this strategy effectively mitigates forgetting, offering a practical path for continual FER in dynamic settings, albeit with higher computational cost and sensitivity to weighting choices.

Abstract

Facial expression recognition is a pivotal component in machine learning, facilitating various applications. However, convolutional neural networks (CNNs) are often plagued by catastrophic forgetting, impeding their adaptability. The proposed method, emotion-centered generative replay (ECgr), tackles this challenge by integrating synthetic images from generative adversarial networks. Moreover, ECgr incorporates a quality assurance algorithm to ensure the fidelity of generated images. This dual approach enables CNNs to retain past knowledge while learning new tasks, enhancing their performance in emotion recognition. The experimental results on four diverse facial expression datasets demonstrate that incorporating images generated by our pseudo-rehearsal method enhances training on the targeted dataset and the source dataset while making the CNN retain previously learned knowledge.
Paper Structure (13 sections, 1 equation, 6 figures, 3 tables, 2 algorithms)

This paper contains 13 sections, 1 equation, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: An overview of the proposed method reveals two key components. At the top, the emotion-centered WGAN-GP with CNN QA is depicted. This component involves training a WGAN-GP for each class in the source dataset to generate synthetic data resembling that class. At the bottom, the fine-tuning strategy is illustrated, where our synthetic dataset is replayed alongside the target dataset.
  • Figure 2: Result samples on different classes from the MUG, JAFFE, and TFEID's synthetic dataset generated by the WGAN-GP. The first column (in green) displays the original samples from the MUG, TFEID, and JAFFE datasets from top to bottom, respectively. In contrast, the second-to-last column (in orange) features the corresponding synthetic images for each dataset.
  • Figure 3: Some rejected samples identified by the QA algorithm from the synthetic datasets of MUG, JAFFE, and TFEID.
  • Figure 4: Accuracy results on the MUG dataset, showcasing the continuous adaptation of a trained CNN across JAFFE, TFEID, and CK+ datasets relative to the baseline accuracy.
  • Figure 5: Accuracy comparison for the JAFFE and TFEID datasets.
  • ...and 1 more figures