Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

Fengyuan Zhang; Zhaopei Huang; Xinjie Zhang; Qin Jin

Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

Fengyuan Zhang, Zhaopei Huang, Xinjie Zhang, Qin Jin

TL;DR

This paper addresses micro-expression recognition by explicitly modeling temporal dependencies across entire clips. It introduces Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN), which combines Temporal Motion Pairing & Encoding with an Adaptive Temporal Motion Layer to fuse global and local motion information via a graph containing a Global Motion Node and Local Motion Nodes. The method uses adaptive edge weights, a forgetting-rate based adjacency update, and a Self-Attention classifier, achieving state-of-the-art results on CAS(ME)$^3$ and Composite datasets and competitive performance on SAMM and CASME II. The proposed framework improves MER performance by mitigating temporal redundancy and emphasizing temporally informative motions, with attention-based visualization supporting its ability to focus on critical temporal regions.

Abstract

Micro-expressions serve as essential cues for understanding individuals' genuine emotional states. Recognizing micro-expressions attracts increasing research attention due to its various applications in fields such as business negotiation and psychotherapy. However, the intricate and transient nature of micro-expressions poses a significant challenge to their accurate recognition. Most existing works either neglect temporal dependencies or suffer from redundancy issues in clip-level recognition. In this work, we propose a novel framework for micro-expression recognition, named the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN). Our framework excels at capturing temporal dependencies between frames across the entire clip, thereby enhancing micro-expression recognition at the clip level. Specifically, the integration of Adaptive Temporal Motion layers empowers our method to aggregate global and local motion features inherent in micro-expressions. Experimental results demonstrate that ATM-GCN not only surpasses existing state-of-the-art methods, particularly on the Composite dataset, but also achieves superior performance on the latest micro-expression dataset CAS(ME)$^3$.

Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

TL;DR

and Composite datasets and competitive performance on SAMM and CASME II. The proposed framework improves MER performance by mitigating temporal redundancy and emphasizing temporally informative motions, with attention-based visualization supporting its ability to focus on critical temporal regions.

Abstract

Paper Structure (21 sections, 12 equations, 5 figures, 4 tables)

This paper contains 21 sections, 12 equations, 5 figures, 4 tables.

Introduction
Related Works
Micro-Expression Recognition
GCNs for MER
Method
Temporal Motion Pairing & Encoding
Adaptive Temporal Motion guided GCN
Nodes
Edges
Edge Weighting Strategies
Adaptive Temporal Motion Layer
Classifier
Experiment
Experiment Settings
Implementation Details
...and 6 more sections

Figures (5)

Figure 1: Macro- vs Micro-expression of Happiness
Figure 2: An overview of the proposed ATM-GCN approach for micro-expression recognition. $f_o$ and $f_a$ represent the Onset and Apex frame respectively for simplification. The input sequence is first input into the Motion Pairing & Encoding module for extracting motion features between frame pairs, which are then aggregated through the Adaptive Temporal Motion GCN (ATM-GCN) module. Finally, a Classifier module is utilized to get the predicted micro-expression for the input sequence.
Figure 3: Detailed illustration of our Motion Pairing & Encoding module. Frame $f_a$ denotes the Apex frame $f_{apex}$ and $m_a$ denotes the corresponding $m_{apex}$ for simplicity. The Onset frame is paired with each of other $L$-1 frames. We then extract motion features from the pairs for nodes initialization.
Figure 4: The detailed graph construction process of our ATM-GCN. $v_g$ represents the Global Motion Node $v_{global}$ for simplification. $v_1$ is trivial and removed in the graph construction process. The initial node features $\{h_i^{(0)}\}$ are input into the ATM-GCN module for graph construction and processing.
Figure 5: Visualization of attention maps of samples from Subject 1,8,6 in CAS(ME)$^3$ respectively. Best viewed in color.

Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

TL;DR

Abstract

Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (5)