Table of Contents
Fetching ...

Meta-Auxiliary Learning for Micro-Expression Recognition

Jingyao Wang, Yunhan Tian, Yuxuan Yang, Xiaoxin Chen, Changwen Zheng, Wenwen Qiang

TL;DR

LightmanNet tackles micro-expression recognition under real-world constraints by deploying a dual-branch meta-auxiliary learning framework. It combines a primary MER branch with an auxiliary image-alignment branch, trained via bi-level optimization to first learn task-specific knowledge and then distill general MER knowledge across tasks. The auxiliary branch leverages macro-expression similarities to guide discriminative feature learning, reducing reliance on large labeled ME datasets. Across five benchmark datasets and robustness tests (including few-shot and noisy data), LightmanNet achieves state-of-the-art performance with favorable efficiency, indicating strong practical potential for real-world MER applications.

Abstract

Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection. However, despite its wide applications in various scenarios, micro-expression recognition (MER) remains a challenging problem in real life due to three reasons, including (i) data-level: lack of data and imbalanced classes, (ii) feature-level: subtle, rapid changing, and complex features of MEs, and (iii) decision-making-level: impact of individual differences. To address these issues, we propose a dual-branch meta-auxiliary learning method, called LightmanNet, for fast and robust micro-expression recognition. Specifically, LightmanNet learns general MER knowledge from limited data through a dual-branch bi-level optimization process: (i) In the first level, it obtains task-specific MER knowledge by learning in two branches, where the first branch is for learning MER features via primary MER tasks, while the other branch is for guiding the model obtain discriminative features via auxiliary tasks, i.e., image alignment between micro-expressions and macro-expressions since their resemblance in both spatial and temporal behavioral patterns. The two branches of learning jointly constrain the model of learning meaningful task-specific MER knowledge while avoiding learning noise or superficial connections between MEs and emotions that may damage its generalization ability. (ii) In the second level, LightmanNet further refines the learned task-specific knowledge, improving model generalization and efficiency. Extensive experiments on various benchmark datasets demonstrate the superior robustness and efficiency of LightmanNet.

Meta-Auxiliary Learning for Micro-Expression Recognition

TL;DR

LightmanNet tackles micro-expression recognition under real-world constraints by deploying a dual-branch meta-auxiliary learning framework. It combines a primary MER branch with an auxiliary image-alignment branch, trained via bi-level optimization to first learn task-specific knowledge and then distill general MER knowledge across tasks. The auxiliary branch leverages macro-expression similarities to guide discriminative feature learning, reducing reliance on large labeled ME datasets. Across five benchmark datasets and robustness tests (including few-shot and noisy data), LightmanNet achieves state-of-the-art performance with favorable efficiency, indicating strong practical potential for real-world MER applications.

Abstract

Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection. However, despite its wide applications in various scenarios, micro-expression recognition (MER) remains a challenging problem in real life due to three reasons, including (i) data-level: lack of data and imbalanced classes, (ii) feature-level: subtle, rapid changing, and complex features of MEs, and (iii) decision-making-level: impact of individual differences. To address these issues, we propose a dual-branch meta-auxiliary learning method, called LightmanNet, for fast and robust micro-expression recognition. Specifically, LightmanNet learns general MER knowledge from limited data through a dual-branch bi-level optimization process: (i) In the first level, it obtains task-specific MER knowledge by learning in two branches, where the first branch is for learning MER features via primary MER tasks, while the other branch is for guiding the model obtain discriminative features via auxiliary tasks, i.e., image alignment between micro-expressions and macro-expressions since their resemblance in both spatial and temporal behavioral patterns. The two branches of learning jointly constrain the model of learning meaningful task-specific MER knowledge while avoiding learning noise or superficial connections between MEs and emotions that may damage its generalization ability. (ii) In the second level, LightmanNet further refines the learned task-specific knowledge, improving model generalization and efficiency. Extensive experiments on various benchmark datasets demonstrate the superior robustness and efficiency of LightmanNet.
Paper Structure (24 sections, 4 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 4 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of micro- and macro- expressions. (a) shows examples of different expressions, where the red arrow represents the muscle movement direction. (b) shows the optical flow of expressions in a "happy" video sequence, where the highlight indicates the movement area. (c) shows the change in amplitude of the receptive fields of different expressions, where the spikes indicate that the pixel changes are high. More details are provided in the Appendix.
  • Figure 2: Overview of LightmanNet. It first builds various training tasks (gray boxes on the left), and then learns general MER-related knowledge via bi-level optimization: (i) in the first level (green circle), the model $f_{\theta}$ learns task-specific MER knowledge for task $\tau_i$ and obtain $f_{\theta}^i$ through two branches of learning, i.e., $f_{\theta}\to f_{\theta}^i$, where the primary branch is used to learn MER features, while the auxiliary branch is used to guide $f_{\theta}$ to obtain discriminative features; then, (ii) in the second level (purple circle), $f_{\theta}$ refines the learned task-specific knowledge of $f_{\theta}^i$, i.e., optimized with the cumulative loss of multiple $f_{\theta}^i$.
  • Figure 3: Illustration of our model architecture. The two branch structures of the model share the same 2DCNN network for feature extraction (left side), but the last layer of the encoder and the decoder are different (right side).
  • Figure 4: Performance comparison (UAR) of the baselines and our proposed LightmanNet on the CDE dataset. We choose the recently proposed baselines for comparison, which cover the SOTA and classic MER method on the CDE dataset, including meta-learning-based and deep-learning-based.
  • Figure 5: Model efficiency comparison of the baselines and LightmanNet on the CDE dataset, which is recorded with the same batch size and official code configuration. We record the performance of methods cover all the four types of baselines in Section \ref{['sec:5.1']} for 5 rounds of experiments.
  • ...and 2 more figures