Learning Contrastive Feature Representations for Facial Action Unit Detection
Ziqiao Shang, Bin Liu, Fengmao Lv, Fei Teng, Tianrui Li, Lan-Zhe Guo
TL;DR
This work tackles facial action unit detection under two core challenges: severe per-AU class imbalance and noisy/noisy AU annotations. It introduces AUNCE, a discriminative contrastive learning loss that blends self-supervised and supervised signals to emphasize differential AU information rather than full-face pixel cues. AUNCE incorporates a negative sample re-weighting scheme to prioritize minority AUs and a four-type positive sample sampling strategy to mitigate label noise, including self-supervised signals and class centroids. Extensive experiments on BP4D, DISFA, BP4D+, GFT, and Aff-Wild2 demonstrate state-of-the-art performance and strong cross-dataset generalization, with ablations validating each component’s contribution. The approach offers a robust, efficient direction for AU detection in both constrained and in-the-wild settings, with public code available for replication.
Abstract
For the Facial Action Unit (AU) detection task, accurately capturing the subtle facial differences between distinct AUs is essential for reliable detection. Additionally, AU detection faces challenges from class imbalance and the presence of noisy or false labels, which undermine detection accuracy. In this paper, we introduce a novel contrastive learning framework aimed for AU detection that incorporates both self-supervised and supervised signals, thereby enhancing the learning of discriminative features for accurate AU detection. To tackle the class imbalance issue, we employ a negative sample re-weighting strategy that adjusts the step size of updating parameters for minority and majority class samples. Moreover, to address the challenges posed by noisy and false AU labels, we employ a sampling technique that encompasses three distinct types of positive sample pairs. This enables us to inject self-supervised signals into the supervised signal, effectively mitigating the adverse effects of noisy labels. Our experimental assessments, conducted on five widely-utilized benchmark datasets (BP4D, DISFA, BP4D+, GFT and Aff-Wild2), underscore the superior performance of our approach compared to state-of-the-art methods of AU detection. Our code is available at https://github.com/Ziqiao-Shang/AUNCE.
