Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

Ruolin Li; Lu Wang; Tingting Yang; Lisheng Xu; Bingyang Ma; Yongchun Li; Hongchao Wei

Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

Ruolin Li, Lu Wang, Tingting Yang, Lisheng Xu, Bingyang Ma, Yongchun Li, Hongchao Wei

TL;DR

Micro-expression recognition is hindered by ultra-short durations and subtle motions, compounded by limited data. The authors introduce MoExt, a motion extraction strategy that pre-trains a feature separator and a motion extractor on both macro- and micro-expression data, using apex-frame reconstruction and contrastive losses to learn ME-specific motion features from onset and apex frames. After pre-training, MoExt is integrated into a MER network and fine-tuned on ME data, enabling end-to-end recognition from onset/apex input. Experiments on CASME II, SAMM, CAS(ME)³ and SMIC-HS show MoExt achieving state-of-the-art or competitive results, validating its effectiveness and robustness. The method reduces overfitting through macro-data pre-training and explicit motion-texture separation, with potential impact on psychotherapy, security, and lie-detection applications.

Abstract

Micro-expressions (MEs) are spontaneous, unconscious facial expressions that have promising applications in various fields such as psychotherapy and national security. Thus, micro-expression recognition (MER) has attracted more and more attention from researchers. Although various MER methods have emerged especially with the development of deep learning techniques, the task still faces several challenges, e.g. subtle motion and limited training data. To address these problems, we propose a novel motion extraction strategy (MoExt) for the MER task and use additional macro-expression data in the pre-training process. We primarily pretrain the feature separator and motion extractor using the contrastive loss, thus enabling them to extract representative motion features. In MoExt, shape features and texture features are first extracted separately from onset and apex frames, and then motion features related to MEs are extracted based on the shape features of both frames. To enable the model to more effectively separate features, we utilize the extracted motion features and the texture features from the onset frame to reconstruct the apex frame. Through pre-training, the module is enabled to extract inter-frame motion features of facial expressions while excluding irrelevant information. The feature separator and motion extractor are ultimately integrated into the MER network, which is then fine-tuned using the target ME data. The effectiveness of proposed method is validated on three commonly used datasets, i.e., CASME II, SMIC, SAMM, and CAS(ME)3 dataset. The results show that our method performs favorably against state-of-the-art methods.

Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

TL;DR

Abstract

Paper Structure (20 sections, 13 equations, 6 figures, 6 tables)

This paper contains 20 sections, 13 equations, 6 figures, 6 tables.

Introduction
Related work
Method
Pre-training
Motion extraction strategy MoExt
Feature separator
Motion extractor
Apex reconstruction
Objective learning
Loss function
Experiments
Experiments setting
Datasets
Experimental settings
Evaluation metrics
...and 5 more sections

Figures (6)

Figure 1: The main steps of the proposed method: facial cropping, pre-training to reconstruct apex frames, and objective learning to classify MEs.
Figure 2: Pre-training network framework. The feature separator is responsible for extracting shape and texture features, while the motion extractor is responsible for extracting motion features of MEs using shape features. The reconstruction module is responsible for reconstructing the apex frames.
Figure 3: (a) shows a comparison between the onset and apex frame of the ME, while (b) shows a comparison between the onset and fifth frame of a macro-expression. The red box marks the motion area.
Figure 4: Overview of the feature separator structure. The feature separator is responsible for separating the shape features and texture features. In this case, the generic features are extracted by the backbone, and then fed to the shape branch and texture branch to extract shape features and texture features, respectively.
Figure 5: Illustration of contrastive losses, with the middle section providing an explanation of the inputs, the left side illustrating the contrastive loss $L_{\text{st}}$ for texture and shape features, and the right side illustrating the contrastive loss $L_{\text{ss}}$ for shape features between onset and apex frames.
...and 1 more figures

Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

TL;DR

Abstract

Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

Authors

TL;DR

Abstract

Table of Contents

Figures (6)