Table of Contents
Fetching ...

Weakly Supervised Continuous Micro-Expression Intensity Estimation Using Temporal Deep Neural Network

Riyadh Mohammed Almushrafy

TL;DR

This work tackles the absence of frame-level micro-expression intensity labels by introducing a dataset-agnostic, weakly supervised framework that uses triangular pseudo-intensity trajectories derived from onset–apex–offset annotations. It combines a ResNet18-based spatial encoder with a bidirectional GRU temporal model to predict dense frame-wise intensities, supervised by a composite loss including MSE, smoothness, and apex ranking. Across SAMM and CASME II, the model achieves strong temporal agreement with the pseudo-labels, significantly outperforming a frame-wise baseline and demonstrating robustness to dataset-specific differences. The approach offers a practical, reproducible path toward continuous micro-expression analysis under realistic annotation constraints and paves the way for cross-dataset generalization and integration with broader affective computing tasks.

Abstract

Micro-facial expressions are brief and involuntary facial movements that reflect genuine emotional states. While most prior work focuses on classifying discrete micro-expression categories, far fewer studies address the continuous evolution of intensity over time. Progress in this direction is limited by the lack of frame-level intensity labels, which makes fully supervised regression impractical. We propose a unified framework for continuous micro-expression intensity estimation using only weak temporal labels (onset, apex, offset). A simple triangular prior converts sparse temporal landmarks into dense pseudo-intensity trajectories, and a lightweight temporal regression model that combines a ResNet18 encoder with a bidirectional GRU predicts frame-wise intensity directly from image sequences. The method requires no frame-level annotation effort and is applied consistently across datasets through a single preprocessing and temporal alignment pipeline. Experiments on SAMM and CASME II show strong temporal agreement with the pseudo-intensity trajectories. On SAMM, the model reaches a Spearman correlation of 0.9014 and a Kendall correlation of 0.7999, outperforming a frame-wise baseline. On CASME II, it achieves up to 0.9116 and 0.8168, respectively, when trained without the apex-ranking term. Ablation studies confirm that temporal modeling and structured pseudo labels are central to capturing the rise-apex-fall dynamics of micro-facial movements. To our knowledge, this is the first unified approach for continuous micro-expression intensity estimation using only sparse temporal annotations.

Weakly Supervised Continuous Micro-Expression Intensity Estimation Using Temporal Deep Neural Network

TL;DR

This work tackles the absence of frame-level micro-expression intensity labels by introducing a dataset-agnostic, weakly supervised framework that uses triangular pseudo-intensity trajectories derived from onset–apex–offset annotations. It combines a ResNet18-based spatial encoder with a bidirectional GRU temporal model to predict dense frame-wise intensities, supervised by a composite loss including MSE, smoothness, and apex ranking. Across SAMM and CASME II, the model achieves strong temporal agreement with the pseudo-labels, significantly outperforming a frame-wise baseline and demonstrating robustness to dataset-specific differences. The approach offers a practical, reproducible path toward continuous micro-expression analysis under realistic annotation constraints and paves the way for cross-dataset generalization and integration with broader affective computing tasks.

Abstract

Micro-facial expressions are brief and involuntary facial movements that reflect genuine emotional states. While most prior work focuses on classifying discrete micro-expression categories, far fewer studies address the continuous evolution of intensity over time. Progress in this direction is limited by the lack of frame-level intensity labels, which makes fully supervised regression impractical. We propose a unified framework for continuous micro-expression intensity estimation using only weak temporal labels (onset, apex, offset). A simple triangular prior converts sparse temporal landmarks into dense pseudo-intensity trajectories, and a lightweight temporal regression model that combines a ResNet18 encoder with a bidirectional GRU predicts frame-wise intensity directly from image sequences. The method requires no frame-level annotation effort and is applied consistently across datasets through a single preprocessing and temporal alignment pipeline. Experiments on SAMM and CASME II show strong temporal agreement with the pseudo-intensity trajectories. On SAMM, the model reaches a Spearman correlation of 0.9014 and a Kendall correlation of 0.7999, outperforming a frame-wise baseline. On CASME II, it achieves up to 0.9116 and 0.8168, respectively, when trained without the apex-ranking term. Ablation studies confirm that temporal modeling and structured pseudo labels are central to capturing the rise-apex-fall dynamics of micro-facial movements. To our knowledge, this is the first unified approach for continuous micro-expression intensity estimation using only sparse temporal annotations.

Paper Structure

This paper contains 34 sections, 10 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the proposed weakly supervised framework. Top: a micro-expression clip is represented by $T$ uniformly resampled frames between onset and offset. Middle: example onset, apex, and offset frames taken from publication-permitted CASME II subjects (© Xiaolan Fu). Bottom: all $T$ frames are encoded by a ResNet18 backbone, aggregated temporally by a bidirectional GRU, and mapped to a continuous intensity trajectory supervised by triangular pseudo-labels derived from onset--apex--offset landmarks.
  • Figure 2: Triangular pseudo-intensity trajectory derived from onset–apex–offset landmarks. This prior provides dense supervision despite the absence of frame-level intensity labels.
  • Figure 3: Qualitative examples comparing predicted intensity trajectories (blue) with triangular pseudo-labels (gray). The GRU-based model produces smooth and coherent temporal curves that align with expected onset--apex--offset dynamics.