Table of Contents
Fetching ...

Decoupled Doubly Contrastive Learning for Cross Domain Facial Action Unit Detection

Yong Li, Menglin Liu, Zhen Cui, Yi Ding, Yuan Zong, Wenming Zheng, Shiguang Shan, Cuntai Guan

TL;DR

The paper addresses cross-domain facial AU detection, where domain shifts impede robust AU recognition. It introduces Decoupled Doubly Contrastive Adaptation (D^2CA), a framework that disentangles AU-relevant from AU-irrelevant (domain) factors, leverages cross-domain face synthesis via Cyclical Feature Alignment, and enforces purification and alignment through image-level and feature-level contrastive learning. Key contributions include Universal Feature Purification, Cyclical Feature Alignment, and Doubly Contrastive Learning, which together enable semantically aligned cross-domain AU representations and visually convincing cross-domain synthesized faces. Empirical results across five AU benchmarks show that D^2CA consistently outperforms state-of-the-art cross-domain AU detection methods, with average F1 improvements in the range of 6–14%, demonstrating robust adaptation under diverse domain gaps. The approach offers a scalable, data-efficient pathway for robust AU detection in real-world, multi-domain scenarios and lays the groundwork for extending to broader facial analysis tasks.

Abstract

Despite the impressive performance of current vision-based facial action unit (AU) detection approaches, they are heavily susceptible to the variations across different domains and the cross-domain AU detection methods are under-explored. In response to this challenge, we propose a decoupled doubly contrastive adaptation (D$^2$CA) approach to learn a purified AU representation that is semantically aligned for the source and target domains. Specifically, we decompose latent representations into AU-relevant and AU-irrelevant components, with the objective of exclusively facilitating adaptation within the AU-relevant subspace. To achieve the feature decoupling, D$^2$CA is trained to disentangle AU and domain factors by assessing the quality of synthesized faces in cross-domain scenarios when either AU or domain attributes are modified. To further strengthen feature decoupling, particularly in scenarios with limited AU data diversity, D$^2$CA employs a doubly contrastive learning mechanism comprising image and feature-level contrastive learning to ensure the quality of synthesized faces and mitigate feature ambiguities. This new framework leads to an automatically learned, dedicated separation of AU-relevant and domain-relevant factors, and it enables intuitive, scale-specific control of the cross-domain facial image synthesis. Extensive experiments demonstrate the efficacy of D$^2$CA in successfully decoupling AU and domain factors, yielding visually pleasing cross-domain synthesized facial images. Meanwhile, D$^2$CA consistently outperforms state-of-the-art cross-domain AU detection approaches, achieving an average F1 score improvement of 6\%-14\% across various cross-domain scenarios.

Decoupled Doubly Contrastive Learning for Cross Domain Facial Action Unit Detection

TL;DR

The paper addresses cross-domain facial AU detection, where domain shifts impede robust AU recognition. It introduces Decoupled Doubly Contrastive Adaptation (D^2CA), a framework that disentangles AU-relevant from AU-irrelevant (domain) factors, leverages cross-domain face synthesis via Cyclical Feature Alignment, and enforces purification and alignment through image-level and feature-level contrastive learning. Key contributions include Universal Feature Purification, Cyclical Feature Alignment, and Doubly Contrastive Learning, which together enable semantically aligned cross-domain AU representations and visually convincing cross-domain synthesized faces. Empirical results across five AU benchmarks show that D^2CA consistently outperforms state-of-the-art cross-domain AU detection methods, with average F1 improvements in the range of 6–14%, demonstrating robust adaptation under diverse domain gaps. The approach offers a scalable, data-efficient pathway for robust AU detection in real-world, multi-domain scenarios and lays the groundwork for extending to broader facial analysis tasks.

Abstract

Despite the impressive performance of current vision-based facial action unit (AU) detection approaches, they are heavily susceptible to the variations across different domains and the cross-domain AU detection methods are under-explored. In response to this challenge, we propose a decoupled doubly contrastive adaptation (DCA) approach to learn a purified AU representation that is semantically aligned for the source and target domains. Specifically, we decompose latent representations into AU-relevant and AU-irrelevant components, with the objective of exclusively facilitating adaptation within the AU-relevant subspace. To achieve the feature decoupling, DCA is trained to disentangle AU and domain factors by assessing the quality of synthesized faces in cross-domain scenarios when either AU or domain attributes are modified. To further strengthen feature decoupling, particularly in scenarios with limited AU data diversity, DCA employs a doubly contrastive learning mechanism comprising image and feature-level contrastive learning to ensure the quality of synthesized faces and mitigate feature ambiguities. This new framework leads to an automatically learned, dedicated separation of AU-relevant and domain-relevant factors, and it enables intuitive, scale-specific control of the cross-domain facial image synthesis. Extensive experiments demonstrate the efficacy of DCA in successfully decoupling AU and domain factors, yielding visually pleasing cross-domain synthesized facial images. Meanwhile, DCA consistently outperforms state-of-the-art cross-domain AU detection approaches, achieving an average F1 score improvement of 6\%-14\% across various cross-domain scenarios.

Paper Structure

This paper contains 14 sections, 12 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: Main idea of our proposed Decoupled Doubly Contrastive Adaptation (D$^2$CA) method. D$^2$CA learns the decoupled AU-relevant features via encoding the AU-/domain-relevant features and conducting cross-domain face image generation. To consolidate the feature decoupling and adaptation, D$^2$CA utilizes the doubly contrastive learning (CL) paradigm that consists of image (ICL) and feature (FCL) level contrastive learning to ensure the quality of the synthesized face images.
  • Figure 2: Results of domain-preserving AUs interpolation between two source faces. Left column: target face. Second and Last column: source faces. The synthesized target face indicates the AU-irrelevant representations have been well disentangled from the domain-relevant parts. Zoom in for details.
  • Figure 3: (a) shows the framework of D$^2$CA. Given a randomly sampled image pair, in Universal Feature Decoupling, D$^2$CA encodes the AU-relevant (domain-irrelevant) and domain-relevant (AU-irrelevant) features via the shared AU encoder and exclusive domain encoders. Subsequently, the AU features obtained from different sources are exchanged to generate AU-altered facial images. In Cyclical Feature Alignment with source/target, D$^2$CA utilizes the AU-altered images as input to anticipate the AU/domain features necessary for facial image regeneration and reconstruction. The primary objective here is to establish semantic alignment of AU features across different domains. Sub-figure (b) highlights the application of image-level contrastive learning, where a facial image and its reconstructed counterpart are considered as positive samples, while others are regarded as negative samples.
  • Figure 4: Illustration of feature-level contrastive learning (FCL). Domain features from the same identity are pushed close.
  • Figure 5: Network configuration of the shared AU encoder $\mathcal{E}_{au}$. Typically, the private domain encoders utilize the same neural network structure.
  • ...and 8 more figures