Decoupled Doubly Contrastive Learning for Cross Domain Facial Action Unit Detection
Yong Li, Menglin Liu, Zhen Cui, Yi Ding, Yuan Zong, Wenming Zheng, Shiguang Shan, Cuntai Guan
TL;DR
The paper addresses cross-domain facial AU detection, where domain shifts impede robust AU recognition. It introduces Decoupled Doubly Contrastive Adaptation (D^2CA), a framework that disentangles AU-relevant from AU-irrelevant (domain) factors, leverages cross-domain face synthesis via Cyclical Feature Alignment, and enforces purification and alignment through image-level and feature-level contrastive learning. Key contributions include Universal Feature Purification, Cyclical Feature Alignment, and Doubly Contrastive Learning, which together enable semantically aligned cross-domain AU representations and visually convincing cross-domain synthesized faces. Empirical results across five AU benchmarks show that D^2CA consistently outperforms state-of-the-art cross-domain AU detection methods, with average F1 improvements in the range of 6–14%, demonstrating robust adaptation under diverse domain gaps. The approach offers a scalable, data-efficient pathway for robust AU detection in real-world, multi-domain scenarios and lays the groundwork for extending to broader facial analysis tasks.
Abstract
Despite the impressive performance of current vision-based facial action unit (AU) detection approaches, they are heavily susceptible to the variations across different domains and the cross-domain AU detection methods are under-explored. In response to this challenge, we propose a decoupled doubly contrastive adaptation (D$^2$CA) approach to learn a purified AU representation that is semantically aligned for the source and target domains. Specifically, we decompose latent representations into AU-relevant and AU-irrelevant components, with the objective of exclusively facilitating adaptation within the AU-relevant subspace. To achieve the feature decoupling, D$^2$CA is trained to disentangle AU and domain factors by assessing the quality of synthesized faces in cross-domain scenarios when either AU or domain attributes are modified. To further strengthen feature decoupling, particularly in scenarios with limited AU data diversity, D$^2$CA employs a doubly contrastive learning mechanism comprising image and feature-level contrastive learning to ensure the quality of synthesized faces and mitigate feature ambiguities. This new framework leads to an automatically learned, dedicated separation of AU-relevant and domain-relevant factors, and it enables intuitive, scale-specific control of the cross-domain facial image synthesis. Extensive experiments demonstrate the efficacy of D$^2$CA in successfully decoupling AU and domain factors, yielding visually pleasing cross-domain synthesized facial images. Meanwhile, D$^2$CA consistently outperforms state-of-the-art cross-domain AU detection approaches, achieving an average F1 score improvement of 6\%-14\% across various cross-domain scenarios.
