Table of Contents
Fetching ...

A Benchmark for Incremental Micro-expression Recognition

Zhengqin Lai, Xiaopeng Hong, Yabin Wang, Xiaobai Li

TL;DR

This work defines Incremental Micro-Expression Recognition (IMER) as learning from a sequence of datasets $D^{(t)}$ where each $D^{(t)} = \{(x_j^t, y_j^t, id_j^t)\}$ and the cumulative label space is $L_t = \bigcup_{k=1}^t l^{(k)}$, with the model updated per session to recognize all encountered classes. It organizes MER data chronologically (CASME II, SAMM, MMEW, CAS(ME)$^3$) and introduces fold-binding cross-session evaluation alongside two within-session protocols (SLCV and ILCV), to manage cross-dataset and cross-subject testing efficiently. To address the composite class-domain incremental nature of MER, it presents a Remappable Classification Head (RCH) that maintains per-session heads $H^t$ and aggregates them via $H_{final}^c = \sum_{t \in \mathcal{T}_c} H_t^c$, enabling $p(c|\mathbf{x}) = \text{softmax}(\mathbf{x}^T H_{final}^c)$. Six baselines built on backbones like ResNet, ViT, and Swin Transformer are evaluated with RCH, and results show transformer-based, pre-trained-model approaches (e.g., RanPAC) yielding the strongest performance across protocols, thereby establishing a practical IMER benchmark with clear avenues for future research.

Abstract

Micro-expression recognition plays a pivotal role in understanding hidden emotions and has applications across various fields. Traditional recognition methods assume access to all training data at once, but real-world scenarios involve continuously evolving data streams. To respond to the requirement of adapting to new data while retaining previously learned knowledge, we introduce the first benchmark specifically designed for incremental micro-expression recognition. Our contributions include: Firstly, we formulate the incremental learning setting tailored for micro-expression recognition. Secondly, we organize sequential datasets with carefully curated learning orders to reflect real-world scenarios. Thirdly, we define two cross-evaluation-based testing protocols, each targeting distinct evaluation objectives. Finally, we provide six baseline methods and their corresponding evaluation results. This benchmark lays the groundwork for advancing incremental micro-expression recognition research. All source code used in this study will be publicly available at https://github.com/ZhengQinLai/IMER-benchmark.

A Benchmark for Incremental Micro-expression Recognition

TL;DR

This work defines Incremental Micro-Expression Recognition (IMER) as learning from a sequence of datasets where each and the cumulative label space is , with the model updated per session to recognize all encountered classes. It organizes MER data chronologically (CASME II, SAMM, MMEW, CAS(ME)) and introduces fold-binding cross-session evaluation alongside two within-session protocols (SLCV and ILCV), to manage cross-dataset and cross-subject testing efficiently. To address the composite class-domain incremental nature of MER, it presents a Remappable Classification Head (RCH) that maintains per-session heads and aggregates them via , enabling . Six baselines built on backbones like ResNet, ViT, and Swin Transformer are evaluated with RCH, and results show transformer-based, pre-trained-model approaches (e.g., RanPAC) yielding the strongest performance across protocols, thereby establishing a practical IMER benchmark with clear avenues for future research.

Abstract

Micro-expression recognition plays a pivotal role in understanding hidden emotions and has applications across various fields. Traditional recognition methods assume access to all training data at once, but real-world scenarios involve continuously evolving data streams. To respond to the requirement of adapting to new data while retaining previously learned knowledge, we introduce the first benchmark specifically designed for incremental micro-expression recognition. Our contributions include: Firstly, we formulate the incremental learning setting tailored for micro-expression recognition. Secondly, we organize sequential datasets with carefully curated learning orders to reflect real-world scenarios. Thirdly, we define two cross-evaluation-based testing protocols, each targeting distinct evaluation objectives. Finally, we provide six baseline methods and their corresponding evaluation results. This benchmark lays the groundwork for advancing incremental micro-expression recognition research. All source code used in this study will be publicly available at https://github.com/ZhengQinLai/IMER-benchmark.

Paper Structure

This paper contains 18 sections, 4 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: The incremental micro-expression (IMER) task necessitates learning from a mixture of samples belonging to new categories and originating from new domains.
  • Figure 2: Illustration of the IMER task. The model sequentially processes datasets $\{D^{(i)}\}_{i=1}^n$, where each $D^{(i)}$ contains samples in the form of $(x_j^i, y_j^i, id_j^i)$. At session $S_i$, the model encounters a new label space $l^i$ and updates its knowledge to recognize the cumulative label space $L_i = \bigcup_{k=1}^{i} l^k$. The example demonstrates how the model $\Theta_i$ evolves through sessions, learning to recognize both new emotion categories and their variations across different subject populations and recording conditions.
  • Figure 3: Illustration of (a) fold binding cross-session evaluation, where the same fold index is maintained across sessions to form consistent trials, and each session uses all previously bound folds and the current trial-bound fold as test sets (shown in yellow). (b) and (c) are the within-session data partition protocols in the pipeline of (a). In (b) the subject level partition strategy, data is divided based on subject identifiers; In (c) the instance level partition strategy, individual samples are randomly distributed across folds.
  • Figure 4: Overview of the proposed Remappable Classification Head (RCH). Top-left: The core RCH technique, where classification heads are maintained and remapped across sessions (same-colored circles indicate identical emotion classifiers). Remaining panels: RCH's compatibility with different backbone networks and incremental learning methods, demonstrating its versatility as a general solution.
  • Figure 5: Performance comparison of ViT, SwinT and ResNet in ILCV and SLCV