Table of Contents
Fetching ...

Graph-Attention Network with Adversarial Domain Alignment for Robust Cross-Domain Facial Expression Recognition

Razieh Ghaedi, AmirReza BabaAhmadi, Reyer Zwiggelaar, Xinqi Fan, Nashid Alam

TL;DR

The paper tackles cross-domain facial expression recognition under substantial domain shift by introducing GAT-ADA, a hybrid model that couples a ResNet-50 backbone with a batch-level Graph Attention Network to model inter-sample relations. It integrates adversarial (GRL) and statistical (CORAL, MMD) domain alignment losses to achieve robust cross-domain feature invariance. Across six FER benchmarks under a unified AGRA protocol, GAT-ADA delivers state-of-the-art mean accuracy (74.39%) and exceptional transfer performance (e.g., RAF-DB → FER2013 = 98.04%), while maintaining computational efficiency. The work highlights the value of inter-sample relational modeling and multi-component alignment for practical, real-world CD-FER, with potential applicability to other cross-domain learning tasks.

Abstract

Cross-domain facial expression recognition (CD-FER) remains difficult due to severe domain shift between training and deployment data. We propose Graph-Attention Network with Adversarial Domain Alignment (GAT-ADA), a hybrid framework that couples a ResNet-50 as backbone with a batch-level Graph Attention Network (GAT) to model inter-sample relations under shift. Each mini-batch is cast as a sparse ring graph so that attention aggregates cross-sample cues that are informative for adaptation. To align distributions, GAT-ADA combines adversarial learning via a Gradient Reversal Layer (GRL) with statistical alignment using CORAL and MMD. GAT-ADA is evaluated under a standard unsupervised domain adaptation protocol: training on one labeled source (RAF-DB) and adapting to multiple unlabeled targets (CK+, JAFFE, SFEW 2.0, FER2013, and ExpW). GAT-ADA attains 74.39% mean cross-domain accuracy. On RAF-DB to FER2013, it reaches 98.0% accuracy, corresponding to approximately a 36-point improvement over the best baseline we re-implemented with the same backbone and preprocessing.

Graph-Attention Network with Adversarial Domain Alignment for Robust Cross-Domain Facial Expression Recognition

TL;DR

The paper tackles cross-domain facial expression recognition under substantial domain shift by introducing GAT-ADA, a hybrid model that couples a ResNet-50 backbone with a batch-level Graph Attention Network to model inter-sample relations. It integrates adversarial (GRL) and statistical (CORAL, MMD) domain alignment losses to achieve robust cross-domain feature invariance. Across six FER benchmarks under a unified AGRA protocol, GAT-ADA delivers state-of-the-art mean accuracy (74.39%) and exceptional transfer performance (e.g., RAF-DB → FER2013 = 98.04%), while maintaining computational efficiency. The work highlights the value of inter-sample relational modeling and multi-component alignment for practical, real-world CD-FER, with potential applicability to other cross-domain learning tasks.

Abstract

Cross-domain facial expression recognition (CD-FER) remains difficult due to severe domain shift between training and deployment data. We propose Graph-Attention Network with Adversarial Domain Alignment (GAT-ADA), a hybrid framework that couples a ResNet-50 as backbone with a batch-level Graph Attention Network (GAT) to model inter-sample relations under shift. Each mini-batch is cast as a sparse ring graph so that attention aggregates cross-sample cues that are informative for adaptation. To align distributions, GAT-ADA combines adversarial learning via a Gradient Reversal Layer (GRL) with statistical alignment using CORAL and MMD. GAT-ADA is evaluated under a standard unsupervised domain adaptation protocol: training on one labeled source (RAF-DB) and adapting to multiple unlabeled targets (CK+, JAFFE, SFEW 2.0, FER2013, and ExpW). GAT-ADA attains 74.39% mean cross-domain accuracy. On RAF-DB to FER2013, it reaches 98.0% accuracy, corresponding to approximately a 36-point improvement over the best baseline we re-implemented with the same backbone and preprocessing.

Paper Structure

This paper contains 21 sections, 11 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Facial expression examples.
  • Figure 2: Pipeline of the proposed GAT-ADA framework.
  • Figure 3: Confusion matrices for ResNet-50: (a) RAF-DB to FER2013, (b) RAF-DB to CK+
  • Figure 4: Confusion matrices for ResNet-18: (a) RAF-DB to FER2013, (b) RAF-DB to CK+
  • Figure 5: Performance–efficiency summary across methods. Left: radar chart with cost axes inverted (larger area indicates a better overall profile). Right: bar chart of individual metrics. GAT-ADA shows the strongest balance, high mean accuracy with markedly lower FLOPs, latency, and memory supporting real-time, resource-constrained deployment.