Table of Contents
Fetching ...

Synergistic Spotting and Recognition of Micro-Expression via Temporal State Transition

Bochao Zou, Zizheng Guo, Wenfeng Qin, Xin Li, Kangsheng Wang, Huimin Ma

TL;DR

A novel temporal state transition architecture grounded in the state space model is presented, which replaces conventional window-level classification with video-level regression and proposes a synergistic strategy that enhances overall analysis performance.

Abstract

Micro-expressions are involuntary facial movements that cannot be consciously controlled, conveying subtle cues with substantial real-world applications. The analysis of micro-expressions generally involves two main tasks: spotting micro-expression intervals in long videos and recognizing the emotions associated with these intervals. Previous deep learning methods have primarily relied on classification networks utilizing sliding windows. However, fixed window sizes and window-level hard classification introduce numerous constraints. Additionally, these methods have not fully exploited the potential of complementary pathways for spotting and recognition. In this paper, we present a novel temporal state transition architecture grounded in the state space model, which replaces conventional window-level classification with video-level regression. Furthermore, by leveraging the inherent connections between spotting and recognition tasks, we propose a synergistic strategy that enhances overall analysis performance. Extensive experiments demonstrate that our method achieves state-of-the-art performance. The codes and pre-trained models are available at https://github.com/zizheng-guo/ME-TST.

Synergistic Spotting and Recognition of Micro-Expression via Temporal State Transition

TL;DR

A novel temporal state transition architecture grounded in the state space model is presented, which replaces conventional window-level classification with video-level regression and proposes a synergistic strategy that enhances overall analysis performance.

Abstract

Micro-expressions are involuntary facial movements that cannot be consciously controlled, conveying subtle cues with substantial real-world applications. The analysis of micro-expressions generally involves two main tasks: spotting micro-expression intervals in long videos and recognizing the emotions associated with these intervals. Previous deep learning methods have primarily relied on classification networks utilizing sliding windows. However, fixed window sizes and window-level hard classification introduce numerous constraints. Additionally, these methods have not fully exploited the potential of complementary pathways for spotting and recognition. In this paper, we present a novel temporal state transition architecture grounded in the state space model, which replaces conventional window-level classification with video-level regression. Furthermore, by leveraging the inherent connections between spotting and recognition tasks, we propose a synergistic strategy that enhances overall analysis performance. Extensive experiments demonstrate that our method achieves state-of-the-art performance. The codes and pre-trained models are available at https://github.com/zizheng-guo/ME-TST.
Paper Structure (14 sections, 4 figures, 4 tables)

This paper contains 14 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: A schematic diagram of state transitions. A non-ME state, $h(t)$, transitions to an ME state, $h(t+1)$, upon receiving the input from the onset frame, $x(t)$, subsequently outputting $y(t)$. As subsequent frames are processed, the ME state progressively strengthens until reaching the apex.
  • Figure 2: (a) The framework of the window-level classification method. (b) The framework of the proposed method. Where "$+$" represents addition, "$\times$" represents multiplication, "$\sigma$" represents the activation layer, and the trapezoid represents the linear layer.
  • Figure 3: The schematic diagram of the result-level synergy strategy.
  • Figure 4: Visualization of an example from the CAS(ME)$^3$ dataset results.