Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
Zihan Wang, Siyang Song, Cheng Luo, Songhe Deng, Weicheng Xie, Linlin Shen
TL;DR
The paper tackles the challenge of recognizing facial Action Units (AUs) by explicitly modeling their hierarchical spatial relationships and multi-scale temporal dynamics. It introduces MDHR, comprising two modules: Multi-scale Facial Dynamic Modelling (MFD), which captures AU-related motion across multiple spatial scales with adaptive weighting, and Hierarchical Spatio-temporal AU Relationship Modelling (HSR), which learns local region-based AU dependencies and cross-regional AU interactions via a Graph Attention Network, followed by a Temporal Convolution Network for sequence prediction. The approach achieves state-of-the-art results on BP4D and DISFA, demonstrating that incorporating both multi-scale dynamics and hierarchical AU relationships yields significant performance gains over static and other spatio-temporal methods. This work advances AU recognition by providing a unified framework that respects the anatomical and temporal structure of facial movements, with practical implications for affective computing and related applications.
Abstract
Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper proposes to comprehensively model multi-scale AU-related dynamic and hierarchical spatio-temporal relationship among AUs for their occurrences recognition. Specifically, we first propose a novel multi-scale temporal differencing network with an adaptive weighting block to explicitly capture facial dynamics across frames at different spatial scales, which specifically considers the heterogeneity of range and magnitude in different AUs' activation. Then, a two-stage strategy is introduced to hierarchically model the relationship among AUs based on their spatial distribution (i.e., local and cross-region AU relationship modelling). Experimental results achieved on BP4D and DISFA show that our approach is the new state-of-the-art in the field of AU occurrence recognition. Our code is publicly available at https://github.com/CVI-SZU/MDHR.
