Table of Contents
Fetching ...

Evaluating and Correcting Human Annotation Bias in Dynamic Micro-Expression Recognition

Feng Liu, Bingyu Nan, Xuezhong Qian, Xiaolan Fu

TL;DR

A novel Global Anti-Monotonic Differential Selection Strategy (GAMDSS) architecture for enhancing the effectiveness of spatio-temporal modeling of micro-expressions through keyframe re-selection through a dynamic frame reselection mechanism, offering a new approach to enhancing micro-expression recognition performance.

Abstract

Existing manual labeling of micro-expressions is subject to errors in accuracy, especially in cross-cultural scenarios where deviation in labeling of key frames is more prominent. To address this issue, this paper presents a novel Global Anti-Monotonic Differential Selection Strategy (GAMDSS) architecture for enhancing the effectiveness of spatio-temporal modeling of micro-expressions through keyframe re-selection. Specifically, the method identifies Onset and Apex frames, which are characterized by significant micro-expression variation, from complete micro-expression action sequences via a dynamic frame reselection mechanism. It then uses these to determine Offset frames and construct a rich spatio-temporal dynamic representation. A two-branch structure with shared parameters is then used to efficiently extract spatio-temporal features. Extensive experiments are conducted on seven widely recognized micro-expression datasets. The results demonstrate that GAMDSS effectively reduces subjective errors caused by human factors in multicultural datasets such as SAMM and 4DME. Furthermore, quantitative analyses confirm that offset-frame annotations in multicultural datasets are more uncertain, providing theoretical justification for standardizing micro-expression annotations. These findings directly support our argument for reconsidering the validity and generalizability of dataset annotation paradigms. Notably, this design can be integrated into existing models without increasing the number of parameters, offering a new approach to enhancing micro-expression recognition performance. The source code is available on GitHub[https://github.com/Cross-Innovation-Lab/GAMDSS].

Evaluating and Correcting Human Annotation Bias in Dynamic Micro-Expression Recognition

TL;DR

A novel Global Anti-Monotonic Differential Selection Strategy (GAMDSS) architecture for enhancing the effectiveness of spatio-temporal modeling of micro-expressions through keyframe re-selection through a dynamic frame reselection mechanism, offering a new approach to enhancing micro-expression recognition performance.

Abstract

Existing manual labeling of micro-expressions is subject to errors in accuracy, especially in cross-cultural scenarios where deviation in labeling of key frames is more prominent. To address this issue, this paper presents a novel Global Anti-Monotonic Differential Selection Strategy (GAMDSS) architecture for enhancing the effectiveness of spatio-temporal modeling of micro-expressions through keyframe re-selection. Specifically, the method identifies Onset and Apex frames, which are characterized by significant micro-expression variation, from complete micro-expression action sequences via a dynamic frame reselection mechanism. It then uses these to determine Offset frames and construct a rich spatio-temporal dynamic representation. A two-branch structure with shared parameters is then used to efficiently extract spatio-temporal features. Extensive experiments are conducted on seven widely recognized micro-expression datasets. The results demonstrate that GAMDSS effectively reduces subjective errors caused by human factors in multicultural datasets such as SAMM and 4DME. Furthermore, quantitative analyses confirm that offset-frame annotations in multicultural datasets are more uncertain, providing theoretical justification for standardizing micro-expression annotations. These findings directly support our argument for reconsidering the validity and generalizability of dataset annotation paradigms. Notably, this design can be integrated into existing models without increasing the number of parameters, offering a new approach to enhancing micro-expression recognition performance. The source code is available on GitHub[https://github.com/Cross-Innovation-Lab/GAMDSS].
Paper Structure (19 sections, 17 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 19 sections, 17 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: This paper attempts to address the distortion of ground truth labeling due to human subjective errors in the annotation of micro-expression datasets. (a) Traditional manual annotation of key frames involves three steps: global observation, local observation, and frame-by-frame observation. However, this process requires high expertise from annotators, and the frame-by-frame observation stage may introduce subjective errors. (b) The automatic annotation method locates the Apex by calculating inter-frame changes through a sliding window, without requiring manual frame-by-frame comparison. However, this method typically relies on an extra training process and introduces extra model parameters. (c) The proposed GAMDSS method can re-select key frames based on manual annotations, effectively avoiding human subjective errors. Importantly, it is plug-and-play and does not introduce additional parameters.
  • Figure 2: Under the three-classification conditions, the individual sample difference curves obtained after calculating differences across three datasets are shown in the figure. Specifically, the L2 norm of pixel values between frames is computed frame-by-frame to quantify motion intensity, with its apex serving as a key objective metric for assessing changes in expression intensity. The original manually annotated Apex frames in the dataset are marked with red dashed lines, while the maximum difference values derived from the difference calculations are highlighted with dashed boxes.
  • Figure 3: An overview of the proposed GAMDSS architecture is provided below. (a) The GAMDSS pipeline consists of the following steps: First, Dynamic Frame Reselection Mechanism reselects the three frames with the richest action changes based on different datasets. Second, a backbone model and feature processing method are selected. Next, spatio-temporal features are extracted at different stages using spatio-temporal units with two shared parameters. Where the temporal stream integrates the RMT module, which efficiently models long-term temporal dependencies through a retention mechanism based on Manhattan distance decay. Finally, the spatio-temporal features are integrated, and an auxiliary loss function is introduced to inject additional knowledge, thereby enabling the modeling of the complete evolution process of micro-expressions. (b) The designed method for extracting spatio-temporal features and their fusion approach, where Swish activation layers are employed to enhance feature nonlinearity and improve optimization stability.
  • Figure 4: Conceptual design of GAMDSS. First, a set of relevant frames is determined based on manually annotated key frames. Second, frame pairs with the greatest action changes are reselected from the frame set through difference calculation, and these are used as the reselected onset frames and apex frames. Finally, the reselected offset is determined based on the reselected apex frames.
  • Figure 5: The GAMDSS selection sample is represented visually as follows: pink indicates the original annotation information, blue indicates the GAMDSS re-selected annotation information, and yellow indicates the remaining frames.
  • ...and 3 more figures