D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition

Haoran Wang; Xinji Mai; Zeng Tao; Xuan Tong; Junxiong Lin; Yan Wang; Jiawen Yu; Boyang Wang; Shaoqi Yan; Qing Zhao; Ziheng Zhou; Shuyong Gao; Wenqiang Zhang

D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition

Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Boyang Wang, Shaoqi Yan, Qing Zhao, Ziheng Zhou, Shuyong Gao, Wenqiang Zhang

TL;DR

This initiative aims to dynamically purify the DFER datasets of these two types of noise, ensuring that only high-quality and correctly labeled data is used in the training process, and establishes D2SP’s ability to enhance performance metrics.

Abstract

The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional mappings of facial expressions from video content, underpinned by training on voluminous datasets. Yet, the DFER datasets encompass a substantial volume of noise data. Noise arises from low-quality captures that defy logical labeling, and instances that suffer from mislabeling due to annotation bias, engendering two principal types of uncertainty: the uncertainty regarding data usability and the uncertainty concerning label reliability. Addressing the two types of uncertainty, we have meticulously crafted a two-stage framework aiming at \textbf{S}eeking \textbf{C}ertain data \textbf{I}n extensive \textbf{U}ncertain data (SCIU). This initiative aims to purge the DFER datasets of these uncertainties, thereby ensuring that only clean, verified data is employed in training processes. To mitigate the issue of low-quality samples, we introduce the Coarse-Grained Pruning (CGP) stage, which assesses sample weights and prunes those deemed unusable due to their low weight. For samples with incorrect annotations, the Fine-Grained Correction (FGC) stage evaluates prediction stability to rectify mislabeled data. Moreover, SCIU is conceived as a universally compatible, plug-and-play framework, tailored to integrate seamlessly with prevailing DFER methodologies. Rigorous experiments across prevalent DFER datasets and against numerous benchmark methods substantiates SCIU's capacity to markedly elevate performance metrics.

D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition

TL;DR

Abstract

Paper Structure (14 sections, 14 equations, 3 figures, 5 tables)

This paper contains 14 sections, 14 equations, 3 figures, 5 tables.

Introduction
Related Work
Dynamic Facial Expression Recognition
Learning with Uncertainty
METHOD
Overview
Coarse-Grained Pruning Stage
Fine-Grained Correction
Experimental Evaluation
Experimental Configuration
Comparison with Existing Methods
Ablation Study
Visualization
Conclusion

Figures (3)

Figure 1: This figure illustrates an overview of our proposed Seeking Certainty in Uncertainty (SCIU) framework. Figure \ref{['fig3']}(a) outlines the overview of SCIU , comprising two primary stages: Coarse-Grain Pruning (CGP) and Fine-Grain Correction (FGC). The objective of Coarse-Grain Pruning (CGP) is to eliminate the first type of uncertainty by pruning the low-quality and unusable data , resulting in a coarse-grained certain dataset. This subset is then processed in the FGC stage, where the wrong-annotated samples are corrected. Finally, we obtain fine-grained-certain data which are subsequently utilized for model training. Figure \ref{['fig3']}(b) illustrates that CGP calculates the weight of each sample and prunes those with low weights. Figure \ref{['fig3']}(c) demonstrates that FGC corrects samples that are stably mispredicted across epochs, thus ensuring correctly annotated samples are used for training.
Figure 2: Confusion Matrix of different methods on FERV39k and DFEW.
Figure 3: Illustration of the pruning and correcting samples in FERV39k and DFEW. (a) illustrate the weight distribution of selected samples (high quality) and pruned samples (low quality). (b) illustrates the corrected samples. Upper group is the non-neutral to corrected label, and lower group is the 'Neutral' to corrected label.

D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition

TL;DR

Abstract

D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (3)