Beyond Overfitting: Doubly Adaptive Dropout for Generalizable AU Detection
Yong Li, Yi Ren, Xuesong Niu, Yi Ding, Xiu-Shen Wei, Cuntai Guan
TL;DR
The paper addresses the challenge of cross-domain generalization in facial Action Unit detection by introducing AUDD, a doubly adaptive dropout framework that suppresses domain-sensitive information at both channel and token levels. It combines a CNN-Transformer backbone with Channel Drop Units and Token Drop Units, guided by a domain classifier with Gradient Reversal to learn domain-invariant, AU-discriminative representations, trained via a progressive dropout strategy. Empirical results on BP4D, BP4D+, DISFA, and GFT show AUDD outperforms unsupervised domain adaptation baselines and other dropout methods, with attention and feature analyses supporting improved AU localization and relationship modeling. This approach offers a practical, annotation-efficient path to robust cross-domain AU detection, albeit with future work needed to incorporate foundational vision models and language-based AU definitions for further gains.
Abstract
Facial Action Units (AUs) are essential for conveying psychological states and emotional expressions. While automatic AU detection systems leveraging deep learning have progressed, they often overfit to specific datasets and individual features, limiting their cross-domain applicability. To overcome these limitations, we propose a doubly adaptive dropout approach for cross-domain AU detection, which enhances the robustness of convolutional feature maps and spatial tokens against domain shifts. This approach includes a Channel Drop Unit (CD-Unit) and a Token Drop Unit (TD-Unit), which work together to reduce domain-specific noise at both the channel and token levels. The CD-Unit preserves domain-agnostic local patterns in feature maps, while the TD-Unit helps the model identify AU relationships generalizable across domains. An auxiliary domain classifier, integrated at each layer, guides the selective omission of domain-sensitive features. To prevent excessive feature dropout, a progressive training strategy is used, allowing for selective exclusion of sensitive features at any model layer. Our method consistently outperforms existing techniques in cross-domain AU detection, as demonstrated by extensive experimental evaluations. Visualizations of attention maps also highlight clear and meaningful patterns related to both individual and combined AUs, further validating the approach's effectiveness.
