Table of Contents
Fetching ...

AffectNet+: A Database for Enhancing Facial Expression Recognition with Soft-Labels

Ali Pourramezan Fard, Mohammad Mehdi Hosseini, Timothy D. Sweeny, Mohammad H. Mahoor

TL;DR

This work introduces the notion of soft-labels for facial expression datasets, a new approach to affective computing for more realistic recognition of facial expressions, and introduces AffectNet+, the next-generation facial expression dataset.

Abstract

Automated Facial Expression Recognition (FER) is challenging due to intra-class variations and inter-class similarities. FER can be especially difficult when facial expressions reflect a mixture of various emotions (aka compound expressions). Existing FER datasets, such as AffectNet, provide discrete emotion labels (hard-labels), where a single category of emotion is assigned to an expression. To alleviate inter- and intra-class challenges, as well as provide a better facial expression descriptor, we propose a new approach to create FER datasets through a labeling method in which an image is labeled with more than one emotion (called soft-labels), each with different confidences. Specifically, we introduce the notion of soft-labels for facial expression datasets, a new approach to affective computing for more realistic recognition of facial expressions. To achieve this goal, we propose a novel methodology to accurately calculate soft-labels: a vector representing the extent to which multiple categories of emotion are simultaneously present within a single facial expression. Finding smoother decision boundaries, enabling multi-labeling, and mitigating bias and imbalanced data are some of the advantages of our proposed method. Building upon AffectNet, we introduce AffectNet+, the next-generation facial expression dataset. This dataset contains soft-labels, three categories of data complexity subsets, and additional metadata such as age, gender, ethnicity, head pose, facial landmarks, valence, and arousal. AffectNet+ will be made publicly accessible to researchers.

AffectNet+: A Database for Enhancing Facial Expression Recognition with Soft-Labels

TL;DR

This work introduces the notion of soft-labels for facial expression datasets, a new approach to affective computing for more realistic recognition of facial expressions, and introduces AffectNet+, the next-generation facial expression dataset.

Abstract

Automated Facial Expression Recognition (FER) is challenging due to intra-class variations and inter-class similarities. FER can be especially difficult when facial expressions reflect a mixture of various emotions (aka compound expressions). Existing FER datasets, such as AffectNet, provide discrete emotion labels (hard-labels), where a single category of emotion is assigned to an expression. To alleviate inter- and intra-class challenges, as well as provide a better facial expression descriptor, we propose a new approach to create FER datasets through a labeling method in which an image is labeled with more than one emotion (called soft-labels), each with different confidences. Specifically, we introduce the notion of soft-labels for facial expression datasets, a new approach to affective computing for more realistic recognition of facial expressions. To achieve this goal, we propose a novel methodology to accurately calculate soft-labels: a vector representing the extent to which multiple categories of emotion are simultaneously present within a single facial expression. Finding smoother decision boundaries, enabling multi-labeling, and mitigating bias and imbalanced data are some of the advantages of our proposed method. Building upon AffectNet, we introduce AffectNet+, the next-generation facial expression dataset. This dataset contains soft-labels, three categories of data complexity subsets, and additional metadata such as age, gender, ethnicity, head pose, facial landmarks, valence, and arousal. AffectNet+ will be made publicly accessible to researchers.

Paper Structure

This paper contains 33 sections, 15 equations, 11 figures, 21 tables.

Figures (11)

  • Figure 1: Unlike the traditional approaches, where a single emotion label is assigned to each image, we introduce soft-labels to provide a more comprehensive assessment by considering multiple emotions and indicating the confidence of each emotion's presence in a given face.
  • Figure 2: Architecture of ensemble of binary classifiers (EBC model), as the initial step of the soft-labeling process. It contains ensemble of three ResNet-50 he2016deep, EfficientNet-B3 tan2019efficientnet, and XceptionNet chollet2017xception classifiers, for any expression. There are eight instances of this network architecture, trained for each expression in a binary one-vs-rest method. Finally, their output aggregates to make the expression vector.
  • Figure 3: Architecture of the AU-based classifier for each expression, as the second model of the soft-labeling process. For each emotion class, a multi-head ResNet-50 he2016deep classifier is trained to simultaneously learn the features in the AUs and the expressions. Each model is trained to find the relation between the expressions and AUs. There are eight instances of this network architecture, trained for each expression. Similar to the initial model (EBC), each expression is trained in a binary one-vs-rest way, and their output aggregates to make the expression vector.
  • Figure 4: Distribution of the AffectNet+ sets, including training and validation sets, over different Easy, Challenging, and Difficult subsets.
  • Figure 5: Confusion matrix of the baseline model (ResNet-50 he2016deep) for every subset of AffectNet+ (Easy, Challenging, and Difficult). The baseline model is trained over any subset, separately. Then, the models are evaluated over all the samples in the evaluation set, regardless of their subset.
  • ...and 6 more figures