Table of Contents
Fetching ...

Progressive Multi-Source Domain Adaptation for Personalized Facial Expression Recognition

Muhammad Osama Zeeshan, Marco Pedersoli, Alessandro Lameiras Koerich, Eric Granger

TL;DR

This work tackles personalized facial expression recognition by reframing multi-source domain adaptation as a progressive, subject-level transfer problem. It introduces P-MSDA, which ranks source subjects by their similarity to an unlabeled target and gradually integrates only the most relevant ones, while maintaining a density-based replay memory to prevent catastrophic forgetting. The method combines curriculum learning principles with self-paced adaptation and pseudo-labeling (ACPL), plus a domain-alignment objective that includes MMD terms with a replay domain. Across BioVid, UNBC-McMaster, Aff-Wild2, BAH, and cross-dataset settings, P-MSDA consistently outperforms single-source and standard MSDA baselines, with strong gains and robustness in both CNN and ViT backbones. These results demonstrate improved personalization in FER and pain estimation under realistic, diverse conditions, with practical implications for deploying personalized FER systems while controlling computational costs.

Abstract

Personalized facial expression recognition (FER) involves adapting a machine learning model using samples from labeled sources and unlabeled target domains. Given the challenges of recognizing subtle expressions with considerable interpersonal variability, state-of-the-art unsupervised domain adaptation (UDA) methods focus on the multi-source UDA (MSDA) setting, where each domain corresponds to a specific subject, and improve model accuracy and robustness. However, when adapting to a specific target, the diverse nature of multiple source domains translates to a large shift between source and target data. State-of-the-art MSDA methods for FER address this domain shift by considering all the sources to adapt to the target representations. Nevertheless, adapting to a target subject presents significant challenges due to large distributional differences between source and target domains, often resulting in negative transfer. In addition, integrating all sources simultaneously increases computational costs and causes misalignment with the target. To address these issues, we propose a progressive MSDA approach that gradually introduces information from subjects based on their similarity to the target subject. This will ensure that only the most relevant sources from the target are selected, which helps avoid the negative transfer caused by dissimilar sources. We first exploit the closest sources to reduce the distribution shift with the target and then move towards the furthest while only considering the most relevant sources based on the predetermined threshold. Furthermore, to mitigate catastrophic forgetting caused by the incremental introduction of source subjects, we implemented a density-based memory mechanism that preserves the most relevant historical source samples for adaptation. Our extensive experiments on Biovid, UNBC-McMaster, Aff-Wild2, BAH, and in a cross-dataset setting.

Progressive Multi-Source Domain Adaptation for Personalized Facial Expression Recognition

TL;DR

This work tackles personalized facial expression recognition by reframing multi-source domain adaptation as a progressive, subject-level transfer problem. It introduces P-MSDA, which ranks source subjects by their similarity to an unlabeled target and gradually integrates only the most relevant ones, while maintaining a density-based replay memory to prevent catastrophic forgetting. The method combines curriculum learning principles with self-paced adaptation and pseudo-labeling (ACPL), plus a domain-alignment objective that includes MMD terms with a replay domain. Across BioVid, UNBC-McMaster, Aff-Wild2, BAH, and cross-dataset settings, P-MSDA consistently outperforms single-source and standard MSDA baselines, with strong gains and robustness in both CNN and ViT backbones. These results demonstrate improved personalization in FER and pain estimation under realistic, diverse conditions, with practical implications for deploying personalized FER systems while controlling computational costs.

Abstract

Personalized facial expression recognition (FER) involves adapting a machine learning model using samples from labeled sources and unlabeled target domains. Given the challenges of recognizing subtle expressions with considerable interpersonal variability, state-of-the-art unsupervised domain adaptation (UDA) methods focus on the multi-source UDA (MSDA) setting, where each domain corresponds to a specific subject, and improve model accuracy and robustness. However, when adapting to a specific target, the diverse nature of multiple source domains translates to a large shift between source and target data. State-of-the-art MSDA methods for FER address this domain shift by considering all the sources to adapt to the target representations. Nevertheless, adapting to a target subject presents significant challenges due to large distributional differences between source and target domains, often resulting in negative transfer. In addition, integrating all sources simultaneously increases computational costs and causes misalignment with the target. To address these issues, we propose a progressive MSDA approach that gradually introduces information from subjects based on their similarity to the target subject. This will ensure that only the most relevant sources from the target are selected, which helps avoid the negative transfer caused by dissimilar sources. We first exploit the closest sources to reduce the distribution shift with the target and then move towards the furthest while only considering the most relevant sources based on the predetermined threshold. Furthermore, to mitigate catastrophic forgetting caused by the incremental introduction of source subjects, we implemented a density-based memory mechanism that preserves the most relevant historical source samples for adaptation. Our extensive experiments on Biovid, UNBC-McMaster, Aff-Wild2, BAH, and in a cross-dataset setting.

Paper Structure

This paper contains 41 sections, 18 equations, 11 figures, 13 tables, 2 algorithms.

Figures (11)

  • Figure 1: Comparison between subject-based MSDA and our proposed progressive MSDA. (a) Subject-based MSDA aligns all source domains simultaneously with the target using a Discrepancy-based, Self-supervised, Contrastive-learning, or Adversarial-based approaches. (b) Our Progressive MSDA first rank and gradually adapts source domains (subjects) based on their similarity to the target domain, optimizing the transfer process through sequential adaptation. Secondly, we construct a replay memory (domain) that retains key samples from previously adapted source domains, which are re-accessed after each adaptation. The discrepancy-based approach is applied to align the source and target domains.
  • Figure 2: Overview of our proposed progressive MSDA method for the adaptation to the target subject. Source Selection Phase: We estimate the similarity matrix between every source and target embedding, followed by ranking the sources from most to least similar subjects. Progressive Domain Adaptation Phase: Ranked sources are progressively incorporated through iterative training steps (Train Step-1, Train Step-2, ..., Train Step-n). At each step, a new source subject is introduced and aligned with the target by calculating discrepancy and supervised losses. The Augmented Confident Pseudo-label (ACPL) technique from zeeshan2024subject generates reliable pseudo-labels for the target. Finally, we create a replay dictionary using a density-based selection to preserve previously visited relevant source samples.
  • Figure 3: Average accuracy on Aff-Wild2 and BAH datasets.
  • Figure 4: Comparison of replay sample selection strategies: No Preserve (only new subject), Preserve Random (fixed random samples), Closest k-means (cluster-based), Closest DBSCAN (100 samples/subject), and P-MSDA (Ours) (density-based pertinent samples).
  • Figure 5: T-SNE visualization of Biovid embeddings across source, replay, and target domains (Subject-1 and Subject-5). The replay domain preserves samples over training steps: the Initial Step retains more source data, while later steps select fewer but more pertinent samples, reducing distant-subject influence and enhancing target alignment.
  • ...and 6 more figures