Table of Contents
Fetching ...

From Macro to Micro: Boosting micro-expression recognition via pre-training on macro-expression videos

Hanting Li, Hongjing Niu, Feng Zhao

TL;DR

This paper tackles MER performance bottlenecks caused by scarce micro-expression annotations by introducing MA2MI, a generalized transfer learning paradigm that pre-trains on abundant macro-expression videos through latent-space reconstruction of near-future frames. Central to the approach is MIACNet, a two-branch network that decouples facial position features from facial action features to accurately localize micro-actions. Pre-training on macro data and subsequent fine-tuning on micro data yield state-of-the-art results on CASME II, SAMM, and MMEW, with ablations confirming the value of reconstruction-based pre-training and the facial-position/action decoupling. The method reduces reliance on micro-label data, provides interpretable localization via heat maps, and demonstrates robust cross-domain applicability for MER tasks.

Abstract

Micro-expression recognition (MER) has drawn increasing attention in recent years due to its potential applications in intelligent medical and lie detection. However, the shortage of annotated data has been the major obstacle to further improve deep-learning based MER methods. Intuitively, utilizing sufficient macro-expression data to promote MER performance seems to be a feasible solution. However, the facial patterns of macro-expressions and micro-expressions are significantly different, which makes naive transfer learning methods difficult to deploy directly. To tacle this issue, we propose a generalized transfer learning paradigm, called \textbf{MA}cro-expression \textbf{TO} \textbf{MI}cro-expression (MA2MI). Under our paradigm, networks can learns the ability to represent subtle facial movement by reconstructing future frames. In addition, we also propose a two-branch micro-action network (MIACNet) to decouple facial position features and facial action features, which can help the network more accurately locate facial action locations. Extensive experiments on three popular MER benchmarks demonstrate the superiority of our method.

From Macro to Micro: Boosting micro-expression recognition via pre-training on macro-expression videos

TL;DR

This paper tackles MER performance bottlenecks caused by scarce micro-expression annotations by introducing MA2MI, a generalized transfer learning paradigm that pre-trains on abundant macro-expression videos through latent-space reconstruction of near-future frames. Central to the approach is MIACNet, a two-branch network that decouples facial position features from facial action features to accurately localize micro-actions. Pre-training on macro data and subsequent fine-tuning on micro data yield state-of-the-art results on CASME II, SAMM, and MMEW, with ablations confirming the value of reconstruction-based pre-training and the facial-position/action decoupling. The method reduces reliance on micro-label data, provides interpretable localization via heat maps, and demonstrates robust cross-domain applicability for MER tasks.

Abstract

Micro-expression recognition (MER) has drawn increasing attention in recent years due to its potential applications in intelligent medical and lie detection. However, the shortage of annotated data has been the major obstacle to further improve deep-learning based MER methods. Intuitively, utilizing sufficient macro-expression data to promote MER performance seems to be a feasible solution. However, the facial patterns of macro-expressions and micro-expressions are significantly different, which makes naive transfer learning methods difficult to deploy directly. To tacle this issue, we propose a generalized transfer learning paradigm, called \textbf{MA}cro-expression \textbf{TO} \textbf{MI}cro-expression (MA2MI). Under our paradigm, networks can learns the ability to represent subtle facial movement by reconstructing future frames. In addition, we also propose a two-branch micro-action network (MIACNet) to decouple facial position features and facial action features, which can help the network more accurately locate facial action locations. Extensive experiments on three popular MER benchmarks demonstrate the superiority of our method.
Paper Structure (20 sections, 12 equations, 6 figures, 6 tables)

This paper contains 20 sections, 12 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: (a) Previous methods focus on finding common patterns (features) of macro-expressions and micro-expressions of the same category. $F_{mi}$ and $F_{ma}$ stand for features of micro- and macro-expressions, respectively. While (b) our method pre-trains the network on adjacent frames of macro-expression videos to obtain the ability to represent small facial actions.
  • Figure 2: MIACNet for encoding subtle facial actions between $I_t$ and $I_{t+\delta}$.
  • Figure 3: The pre-training process on macro-expression data.
  • Figure 4: The impact of $\delta$ on the performance of the proposed MA2MI on three datasets. The horizontal axis indicates that $\delta$ is any integer belongs to $[a,b]$.
  • Figure 5: Reconstruction results on DFEW. $\hat{I}_{t+\delta}$ is reconstructed from $I_t$ and $C_\Delta$.
  • ...and 1 more figures