Table of Contents
Fetching ...

Prior-based Objective Inference Mining Potential Uncertainty for Facial Expression Recognition

Hanwei Liu, Huiling Cai, Qingcheng Lin, Xuefeng Li, Hui Xiao

TL;DR

A novel Prior-based Objective Inference network is proposed that employs prior knowledge to derive a more objective and varied emotional distribution and tackles the issue of subjective annotation ambiguity through dynamic knowledge transfer and introduces an uncertainty estimation module to quantify and balance facial expression confidence.

Abstract

Annotation ambiguity caused by the inherent subjectivity of visual judgment has always been a major challenge for Facial Expression Recognition (FER) tasks, particularly for largescale datasets from in-the-wild scenarios. A potential solution is the evaluation of relatively objective emotional distributions to help mitigate the ambiguity of subjective annotations. To this end, this paper proposes a novel Prior-based Objective Inference (POI) network. This network employs prior knowledge to derive a more objective and varied emotional distribution and tackles the issue of subjective annotation ambiguity through dynamic knowledge transfer. POI comprises two key networks: Firstly, the Prior Inference Network (PIN) utilizes the prior knowledge of AUs and emotions to capture intricate motion details. To reduce over-reliance on priors and facilitate objective emotional inference, PIN aggregates inferential knowledge from various key facial subregions, encouraging mutual learning. Secondly, the Target Recognition Network (TRN) integrates subjective emotion annotations and objective inference soft labels provided by the PIN, fostering an understanding of inherent facial expression diversity, thus resolving annotation ambiguity. Moreover, we introduce an uncertainty estimation module to quantify and balance facial expression confidence. This module enables a flexible approach to dealing with the uncertainties of subjective annotations. Extensive experiments show that POI exhibits competitive performance on both synthetic noisy datasets and multiple real-world datasets. All codes and training logs will be publicly available at https://github.com/liuhw01/POI.

Prior-based Objective Inference Mining Potential Uncertainty for Facial Expression Recognition

TL;DR

A novel Prior-based Objective Inference network is proposed that employs prior knowledge to derive a more objective and varied emotional distribution and tackles the issue of subjective annotation ambiguity through dynamic knowledge transfer and introduces an uncertainty estimation module to quantify and balance facial expression confidence.

Abstract

Annotation ambiguity caused by the inherent subjectivity of visual judgment has always been a major challenge for Facial Expression Recognition (FER) tasks, particularly for largescale datasets from in-the-wild scenarios. A potential solution is the evaluation of relatively objective emotional distributions to help mitigate the ambiguity of subjective annotations. To this end, this paper proposes a novel Prior-based Objective Inference (POI) network. This network employs prior knowledge to derive a more objective and varied emotional distribution and tackles the issue of subjective annotation ambiguity through dynamic knowledge transfer. POI comprises two key networks: Firstly, the Prior Inference Network (PIN) utilizes the prior knowledge of AUs and emotions to capture intricate motion details. To reduce over-reliance on priors and facilitate objective emotional inference, PIN aggregates inferential knowledge from various key facial subregions, encouraging mutual learning. Secondly, the Target Recognition Network (TRN) integrates subjective emotion annotations and objective inference soft labels provided by the PIN, fostering an understanding of inherent facial expression diversity, thus resolving annotation ambiguity. Moreover, we introduce an uncertainty estimation module to quantify and balance facial expression confidence. This module enables a flexible approach to dealing with the uncertainties of subjective annotations. Extensive experiments show that POI exhibits competitive performance on both synthetic noisy datasets and multiple real-world datasets. All codes and training logs will be publicly available at https://github.com/liuhw01/POI.

Paper Structure

This paper contains 27 sections, 12 equations, 15 figures, 12 tables.

Figures (15)

  • Figure 1: Subjective voting results and AUs judgment results for emotion categories. (a) and (c) show the voting results of 20 volunteers on facial emotion categories, demonstrating the uncertainty of observers' subjective judgment. (b) and (d) show the annotation results of emotion categories based on prior knowledge. Su: Surprise, Fe: Fear, Di: Disgust, Ha: Happy, Sa: Sadness, An: Anger, and Ne: Neutral.
  • Figure 2: Structure of proposed method. POI consists of a shared feature extractor, a Prior Inference Network, and a Target Recognition Network, which are designed to learn potentially diverse emotions and thereby address annotation ambiguity. ${p_{1}}$, ${{p}_{2}}$, …, ${p_{4}}$ denote emotion prediction distribution of discriminant regions; ${\widetilde{p}^{*}}$ denotes intermediate prediction distribution; and ${q}$ denotes prediction distribution of target network.
  • Figure 3: AU groups with high correlation with emotion in key regions of left eye and right side of mouth, which show active muscle movements near eyes and mouth.
  • Figure 4: Expression consistency estimation for all facial subregions $\Delta_{sub}$. High (Low) consistency corresponds to high (low) confidence.
  • Figure 5: Partial sample of the dataset. The red box indicates that the sample with emotional labelling has certain confusion, and the orange box indicates that the label of the sample has labelling confidence.
  • ...and 10 more figures