Table of Contents
Fetching ...

Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

Fengyuan Zhang, Zhaopei Huang, Xinjie Zhang, Qin Jin

TL;DR

This paper addresses micro-expression recognition by explicitly modeling temporal dependencies across entire clips. It introduces Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN), which combines Temporal Motion Pairing & Encoding with an Adaptive Temporal Motion Layer to fuse global and local motion information via a graph containing a Global Motion Node and Local Motion Nodes. The method uses adaptive edge weights, a forgetting-rate based adjacency update, and a Self-Attention classifier, achieving state-of-the-art results on CAS(ME)$^3$ and Composite datasets and competitive performance on SAMM and CASME II. The proposed framework improves MER performance by mitigating temporal redundancy and emphasizing temporally informative motions, with attention-based visualization supporting its ability to focus on critical temporal regions.

Abstract

Micro-expressions serve as essential cues for understanding individuals' genuine emotional states. Recognizing micro-expressions attracts increasing research attention due to its various applications in fields such as business negotiation and psychotherapy. However, the intricate and transient nature of micro-expressions poses a significant challenge to their accurate recognition. Most existing works either neglect temporal dependencies or suffer from redundancy issues in clip-level recognition. In this work, we propose a novel framework for micro-expression recognition, named the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN). Our framework excels at capturing temporal dependencies between frames across the entire clip, thereby enhancing micro-expression recognition at the clip level. Specifically, the integration of Adaptive Temporal Motion layers empowers our method to aggregate global and local motion features inherent in micro-expressions. Experimental results demonstrate that ATM-GCN not only surpasses existing state-of-the-art methods, particularly on the Composite dataset, but also achieves superior performance on the latest micro-expression dataset CAS(ME)$^3$.

Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

TL;DR

This paper addresses micro-expression recognition by explicitly modeling temporal dependencies across entire clips. It introduces Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN), which combines Temporal Motion Pairing & Encoding with an Adaptive Temporal Motion Layer to fuse global and local motion information via a graph containing a Global Motion Node and Local Motion Nodes. The method uses adaptive edge weights, a forgetting-rate based adjacency update, and a Self-Attention classifier, achieving state-of-the-art results on CAS(ME) and Composite datasets and competitive performance on SAMM and CASME II. The proposed framework improves MER performance by mitigating temporal redundancy and emphasizing temporally informative motions, with attention-based visualization supporting its ability to focus on critical temporal regions.

Abstract

Micro-expressions serve as essential cues for understanding individuals' genuine emotional states. Recognizing micro-expressions attracts increasing research attention due to its various applications in fields such as business negotiation and psychotherapy. However, the intricate and transient nature of micro-expressions poses a significant challenge to their accurate recognition. Most existing works either neglect temporal dependencies or suffer from redundancy issues in clip-level recognition. In this work, we propose a novel framework for micro-expression recognition, named the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN). Our framework excels at capturing temporal dependencies between frames across the entire clip, thereby enhancing micro-expression recognition at the clip level. Specifically, the integration of Adaptive Temporal Motion layers empowers our method to aggregate global and local motion features inherent in micro-expressions. Experimental results demonstrate that ATM-GCN not only surpasses existing state-of-the-art methods, particularly on the Composite dataset, but also achieves superior performance on the latest micro-expression dataset CAS(ME).
Paper Structure (21 sections, 12 equations, 5 figures, 4 tables)

This paper contains 21 sections, 12 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Macro- vs Micro-expression of Happiness
  • Figure 2: An overview of the proposed ATM-GCN approach for micro-expression recognition. $f_o$ and $f_a$ represent the Onset and Apex frame respectively for simplification. The input sequence is first input into the Motion Pairing & Encoding module for extracting motion features between frame pairs, which are then aggregated through the Adaptive Temporal Motion GCN (ATM-GCN) module. Finally, a Classifier module is utilized to get the predicted micro-expression for the input sequence.
  • Figure 3: Detailed illustration of our Motion Pairing & Encoding module. Frame $f_a$ denotes the Apex frame $f_{apex}$ and $m_a$ denotes the corresponding $m_{apex}$ for simplicity. The Onset frame is paired with each of other $L$-1 frames. We then extract motion features from the pairs for nodes initialization.
  • Figure 4: The detailed graph construction process of our ATM-GCN. $v_g$ represents the Global Motion Node $v_{global}$ for simplification. $v_1$ is trivial and removed in the graph construction process. The initial node features $\{h_i^{(0)}\}$ are input into the ATM-GCN module for graph construction and processing.
  • Figure 5: Visualization of attention maps of samples from Subject 1,8,6 in CAS(ME)$^3$ respectively. Best viewed in color.