Table of Contents
Fetching ...

Beyond Overfitting: Doubly Adaptive Dropout for Generalizable AU Detection

Yong Li, Yi Ren, Xuesong Niu, Yi Ding, Xiu-Shen Wei, Cuntai Guan

TL;DR

The paper addresses the challenge of cross-domain generalization in facial Action Unit detection by introducing AUDD, a doubly adaptive dropout framework that suppresses domain-sensitive information at both channel and token levels. It combines a CNN-Transformer backbone with Channel Drop Units and Token Drop Units, guided by a domain classifier with Gradient Reversal to learn domain-invariant, AU-discriminative representations, trained via a progressive dropout strategy. Empirical results on BP4D, BP4D+, DISFA, and GFT show AUDD outperforms unsupervised domain adaptation baselines and other dropout methods, with attention and feature analyses supporting improved AU localization and relationship modeling. This approach offers a practical, annotation-efficient path to robust cross-domain AU detection, albeit with future work needed to incorporate foundational vision models and language-based AU definitions for further gains.

Abstract

Facial Action Units (AUs) are essential for conveying psychological states and emotional expressions. While automatic AU detection systems leveraging deep learning have progressed, they often overfit to specific datasets and individual features, limiting their cross-domain applicability. To overcome these limitations, we propose a doubly adaptive dropout approach for cross-domain AU detection, which enhances the robustness of convolutional feature maps and spatial tokens against domain shifts. This approach includes a Channel Drop Unit (CD-Unit) and a Token Drop Unit (TD-Unit), which work together to reduce domain-specific noise at both the channel and token levels. The CD-Unit preserves domain-agnostic local patterns in feature maps, while the TD-Unit helps the model identify AU relationships generalizable across domains. An auxiliary domain classifier, integrated at each layer, guides the selective omission of domain-sensitive features. To prevent excessive feature dropout, a progressive training strategy is used, allowing for selective exclusion of sensitive features at any model layer. Our method consistently outperforms existing techniques in cross-domain AU detection, as demonstrated by extensive experimental evaluations. Visualizations of attention maps also highlight clear and meaningful patterns related to both individual and combined AUs, further validating the approach's effectiveness.

Beyond Overfitting: Doubly Adaptive Dropout for Generalizable AU Detection

TL;DR

The paper addresses the challenge of cross-domain generalization in facial Action Unit detection by introducing AUDD, a doubly adaptive dropout framework that suppresses domain-sensitive information at both channel and token levels. It combines a CNN-Transformer backbone with Channel Drop Units and Token Drop Units, guided by a domain classifier with Gradient Reversal to learn domain-invariant, AU-discriminative representations, trained via a progressive dropout strategy. Empirical results on BP4D, BP4D+, DISFA, and GFT show AUDD outperforms unsupervised domain adaptation baselines and other dropout methods, with attention and feature analyses supporting improved AU localization and relationship modeling. This approach offers a practical, annotation-efficient path to robust cross-domain AU detection, albeit with future work needed to incorporate foundational vision models and language-based AU definitions for further gains.

Abstract

Facial Action Units (AUs) are essential for conveying psychological states and emotional expressions. While automatic AU detection systems leveraging deep learning have progressed, they often overfit to specific datasets and individual features, limiting their cross-domain applicability. To overcome these limitations, we propose a doubly adaptive dropout approach for cross-domain AU detection, which enhances the robustness of convolutional feature maps and spatial tokens against domain shifts. This approach includes a Channel Drop Unit (CD-Unit) and a Token Drop Unit (TD-Unit), which work together to reduce domain-specific noise at both the channel and token levels. The CD-Unit preserves domain-agnostic local patterns in feature maps, while the TD-Unit helps the model identify AU relationships generalizable across domains. An auxiliary domain classifier, integrated at each layer, guides the selective omission of domain-sensitive features. To prevent excessive feature dropout, a progressive training strategy is used, allowing for selective exclusion of sensitive features at any model layer. Our method consistently outperforms existing techniques in cross-domain AU detection, as demonstrated by extensive experimental evaluations. Visualizations of attention maps also highlight clear and meaningful patterns related to both individual and combined AUs, further validating the approach's effectiveness.

Paper Structure

This paper contains 12 sections, 5 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Our proposed AUDD shows its superiority on a broad range of 8 cross-domain AU detection scenarios. Here B, B+, D, G denotes BP4D, BP4D+, DISFA, GFT dataset, respectively. Baseline denotes directly evaluating the AU detection performance on the target dataset. More qualitative comparison can be found in Sec. \ref{['sec:exp_ablation_study']}.
  • Figure 2: (a) The framework of the proposed AUDD for cross-domain AU detection. AUDD takes as input a hybrid batch of source and target images and encode the images with three preceding CNN-based residual blocks and three subsequent transformer blocks (Sec. \ref{['sec:hybrid_cnn_trans']}). To automatically maximize the preservation of domain-agnostic and generalizable channels/tokens, AUDD employs both the CD-Unit and TD-Unit concurrently for adaptive feature weight generation and dropping (Sec. \ref{['sec:mask_gene']}).Each channel/token is assigned a corresponding score to determine whether it is domain-sensitive (Sec. III.C).
  • Figure 3: Domain discrepancy comparisons between AUDD and Baseline method under various cross-domain settings. It is obvious reveal that features encoded by AUDD consistently exhibit lower sensitivity to domain shifts.
  • Figure 4: Domain classification performance w.r.t. different blocks (Source: BP4D, Target:DISFA). AUDD shows consistent low domain classification accuracy, indicating the domain-related features have been well mitigated.
  • Figure 5: t-SNE visualization of the learned features. AUDD shows the best feature discrimination w.r.t AU6. Red/Blue indicate AU6 exists or not.
  • ...and 3 more figures