Table of Contents
Fetching ...

Adaptive Global-Local Representation Learning and Selection for Cross-Domain Facial Expression Recognition

Yuefang Gao, Yuhao Xie, Zeke Zexi Hu, Tianshui Chen, Liang Lin

TL;DR

This work tackles cross-domain facial expression recognition under substantial domain shift by introducing AGLRLS, an adaptive framework that separates global and local adversarial learning, augments training with semantic-aware, feature-level pseudo labels, and employs a global-local prediction-consistency mechanism during inference. The method leverages seven feature streams (one global, five local landmarks, and their concatenation) with seven classifiers and seven discriminators, optimized through two-player objectives and an end-to-end training regime. A key innovation is the feature-level pseudo-label generation using adaptive, class-aware thresholds (IDTS) to address label imbalance and enhance target-domain discriminability, coupled with a dynamic fusion strategy (GLPC) to select the final prediction. Empirical results on six datasets across multiple backbones show state-of-the-art performance, with ablations confirming the contributions of SAL, FPLG, and GLPC, and statistical tests supporting significance. The approach offers a practical path toward robust CD-FER in real-world, cross-domain settings, with publicly available code and models for reproducibility.

Abstract

Domain shift poses a significant challenge in Cross-Domain Facial Expression Recognition (CD-FER) due to the distribution variation across different domains. Current works mainly focus on learning domain-invariant features through global feature adaptation, while neglecting the transferability of local features. Additionally, these methods lack discriminative supervision during training on target datasets, resulting in deteriorated feature representation in target domain. To address these limitations, we propose an Adaptive Global-Local Representation Learning and Selection (AGLRLS) framework. The framework incorporates global-local adversarial adaptation and semantic-aware pseudo label generation to enhance the learning of domain-invariant and discriminative feature during training. Meanwhile, a global-local prediction consistency learning is introduced to improve classification results during inference. Specifically, the framework consists of separate global-local adversarial learning modules that learn domain-invariant global and local features independently. We also design a semantic-aware pseudo label generation module, which computes semantic labels based on global and local features. Moreover, a novel dynamic threshold strategy is employed to learn the optimal thresholds by leveraging independent prediction of global and local features, ensuring filtering out the unreliable pseudo labels while retaining reliable ones. These labels are utilized for model optimization through the adversarial learning process in an end-to-end manner. During inference, a global-local prediction consistency module is developed to automatically learn an optimal result from multiple predictions. We conduct comprehensive experiments and analysis based on a fair evaluation benchmark. The results demonstrate that the proposed framework outperforms the current competing methods by a substantial margin.

Adaptive Global-Local Representation Learning and Selection for Cross-Domain Facial Expression Recognition

TL;DR

This work tackles cross-domain facial expression recognition under substantial domain shift by introducing AGLRLS, an adaptive framework that separates global and local adversarial learning, augments training with semantic-aware, feature-level pseudo labels, and employs a global-local prediction-consistency mechanism during inference. The method leverages seven feature streams (one global, five local landmarks, and their concatenation) with seven classifiers and seven discriminators, optimized through two-player objectives and an end-to-end training regime. A key innovation is the feature-level pseudo-label generation using adaptive, class-aware thresholds (IDTS) to address label imbalance and enhance target-domain discriminability, coupled with a dynamic fusion strategy (GLPC) to select the final prediction. Empirical results on six datasets across multiple backbones show state-of-the-art performance, with ablations confirming the contributions of SAL, FPLG, and GLPC, and statistical tests supporting significance. The approach offers a practical path toward robust CD-FER in real-world, cross-domain settings, with publicly available code and models for reproducibility.

Abstract

Domain shift poses a significant challenge in Cross-Domain Facial Expression Recognition (CD-FER) due to the distribution variation across different domains. Current works mainly focus on learning domain-invariant features through global feature adaptation, while neglecting the transferability of local features. Additionally, these methods lack discriminative supervision during training on target datasets, resulting in deteriorated feature representation in target domain. To address these limitations, we propose an Adaptive Global-Local Representation Learning and Selection (AGLRLS) framework. The framework incorporates global-local adversarial adaptation and semantic-aware pseudo label generation to enhance the learning of domain-invariant and discriminative feature during training. Meanwhile, a global-local prediction consistency learning is introduced to improve classification results during inference. Specifically, the framework consists of separate global-local adversarial learning modules that learn domain-invariant global and local features independently. We also design a semantic-aware pseudo label generation module, which computes semantic labels based on global and local features. Moreover, a novel dynamic threshold strategy is employed to learn the optimal thresholds by leveraging independent prediction of global and local features, ensuring filtering out the unreliable pseudo labels while retaining reliable ones. These labels are utilized for model optimization through the adversarial learning process in an end-to-end manner. During inference, a global-local prediction consistency module is developed to automatically learn an optimal result from multiple predictions. We conduct comprehensive experiments and analysis based on a fair evaluation benchmark. The results demonstrate that the proposed framework outperforms the current competing methods by a substantial margin.
Paper Structure (27 sections, 23 equations, 6 figures, 3 tables)

This paper contains 27 sections, 23 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of the training and inference stages of our proposed AGLRLS model.
  • Figure 2: Bar plot of category distribution in the training datasets.
  • Figure 3: The Friedman Test Chart. The left half is the case of $\alpha$=0.05, and the right half is the case of $\alpha$=0.10. The horizontal coordinate corresponding to the intermediate point of each algorithm is the average order value. The lower the value, the better the performance. The range of horizontal lines on both sides of each intermediate point represents the CD value. If the horizontal lines between the two algorithms do not overlap, it means that the performance of these two methods is significantly different.
  • Figure 4: Bar charts of the three methods' performance under the four configurations.
  • Figure 5: Ablation analysis of FPLG. (a) and (b) show the percentage of the number of pseudo labels and reliable pseudo labels that can be generated by three different pseudo label generation strategies. (c) shows that the pseudo labels generated by IDTS are used to calculate the cross-entropy loss of the target domain data. (d) shows the percentage of pseudo labels of each category generated by the three strategies for each category, relative to the total number of generated pseudo labels. (e) shows the Mean accuracy of the FPLG module using the three strategies.
  • ...and 1 more figures