FMANet: A Novel Dual-Phase Optical Flow Approach with Fusion Motion Attention Network for Robust Micro-expression Recognition
Luu Tu Nguyen, Vu Tram Anh Khuong, Thi Bich Phuong Man, Thi Duyen Ngo, Thanh Ha Le
TL;DR
This work tackles micro-expression recognition by addressing the incomplete temporal modeling of MER methods that rely on single-phase optical flow. It introduces Magnitude-Modulated Combined Optical Flow (MM-COF) to fuse onset–apex and apex–offset motions with magnitude modulation, and FMANet, an end-to-end architecture that learns to fuse and modulate dual-phase motion through learnable blocks. The Phase-Aware Consensus Fusion Block and Soft Motion Attention Block enable adaptive, data-driven fusion and saliency weighting, built upon a shallow CNN backbone for efficiency. Across CASME-II, SAMM, and MMEW, FMANet achieves state-of-the-art or competitive performance, demonstrating robust generalization and the importance of modeling complete micro-expression dynamics. The results indicate that dual-phase motion modeling with learned fusion and attention can substantially improve MER accuracy and reliability in practical settings.
Abstract
Facial micro-expressions, characterized by their subtle and brief nature, are valuable indicators of genuine emotions. Despite their significance in psychology, security, and behavioral analysis, micro-expression recognition remains challenging due to the difficulty of capturing subtle facial movements. Optical flow has been widely employed as an input modality for this task due to its effectiveness. However, most existing methods compute optical flow only between the onset and apex frames, thereby overlooking essential motion information in the apex-to-offset phase. To address this limitation, we first introduce a comprehensive motion representation, termed Magnitude-Modulated Combined Optical Flow (MM-COF), which integrates motion dynamics from both micro-expression phases into a unified descriptor suitable for direct use in recognition networks. Building upon this principle, we then propose FMANet, a novel end-to-end neural network architecture that internalizes the dual-phase analysis and magnitude modulation into learnable modules. This allows the network to adaptively fuse motion cues and focus on salient facial regions for classification. Experimental evaluations on the MMEW, SMIC, CASME-II, and SAMM datasets, widely recognized as standard benchmarks, demonstrate that our proposed MM-COF representation and FMANet outperforms existing methods, underscoring the potential of a learnable, dual-phase framework in advancing micro-expression recognition.
