Table of Contents
Fetching ...

Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation

Jiankun Zhu, Sicheng Zhao, Jing Jiang, Wenbo Tang, Zhaopan Xu, Tingting Han, Pengfei Xu, Hongxun Yao

TL;DR

This work tackles source-free domain adaptation for visual emotion recognition (SFDA-VER), addressing privacy constraints by not accessing source data during adaptation. It introduces Bridge then Begin Anew (BBA), a two-stage framework: Domain-bridged Model Generation (DMG) generates a bridge model to yield reliable pseudo-labels, and Target-related Model Adaptation (TMA) trains a target model from scratch under guidance from the bridge, augmented with masking, clustering, and emotion polarity losses. The approach yields substantial improvements over state-of-the-art SFDA methods and competes with or exceeds several unsupervised domain adaptation baselines across six VER settings, demonstrating robust cross-domain transfer under large affective gaps. Overall, BBA enables privacy-preserving VER deployment by reducing dependence on source data while maintaining strong performance and encouraging target-domain feature discovery.

Abstract

Visual emotion recognition (VER), which aims at understanding humans' emotional reactions toward different visual stimuli, has attracted increasing attention. Given the subjective and ambiguous characteristics of emotion, annotating a reliable large-scale dataset is hard. For reducing reliance on data labeling, domain adaptation offers an alternative solution by adapting models trained on labeled source data to unlabeled target data. Conventional domain adaptation methods require access to source data. However, due to privacy concerns, source emotional data may be inaccessible. To address this issue, we propose an unexplored task: source-free domain adaptation (SFDA) for VER, which does not have access to source data during the adaptation process. To achieve this, we propose a novel framework termed Bridge then Begin Anew (BBA), which consists of two steps: domain-bridged model generation (DMG) and target-related model adaptation (TMA). First, the DMG bridges cross-domain gaps by generating an intermediate model, avoiding direct alignment between two VER datasets with significant differences. Then, the TMA begins training the target model anew to fit the target structure, avoiding the influence of source-specific knowledge. Extensive experiments are conducted on six SFDA settings for VER. The results demonstrate the effectiveness of BBA, which achieves remarkable performance gains compared with state-of-the-art SFDA methods and outperforms representative unsupervised domain adaptation approaches.

Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation

TL;DR

This work tackles source-free domain adaptation for visual emotion recognition (SFDA-VER), addressing privacy constraints by not accessing source data during adaptation. It introduces Bridge then Begin Anew (BBA), a two-stage framework: Domain-bridged Model Generation (DMG) generates a bridge model to yield reliable pseudo-labels, and Target-related Model Adaptation (TMA) trains a target model from scratch under guidance from the bridge, augmented with masking, clustering, and emotion polarity losses. The approach yields substantial improvements over state-of-the-art SFDA methods and competes with or exceeds several unsupervised domain adaptation baselines across six VER settings, demonstrating robust cross-domain transfer under large affective gaps. Overall, BBA enables privacy-preserving VER deployment by reducing dependence on source data while maintaining strong performance and encouraging target-domain feature discovery.

Abstract

Visual emotion recognition (VER), which aims at understanding humans' emotional reactions toward different visual stimuli, has attracted increasing attention. Given the subjective and ambiguous characteristics of emotion, annotating a reliable large-scale dataset is hard. For reducing reliance on data labeling, domain adaptation offers an alternative solution by adapting models trained on labeled source data to unlabeled target data. Conventional domain adaptation methods require access to source data. However, due to privacy concerns, source emotional data may be inaccessible. To address this issue, we propose an unexplored task: source-free domain adaptation (SFDA) for VER, which does not have access to source data during the adaptation process. To achieve this, we propose a novel framework termed Bridge then Begin Anew (BBA), which consists of two steps: domain-bridged model generation (DMG) and target-related model adaptation (TMA). First, the DMG bridges cross-domain gaps by generating an intermediate model, avoiding direct alignment between two VER datasets with significant differences. Then, the TMA begins training the target model anew to fit the target structure, avoiding the influence of source-specific knowledge. Extensive experiments are conducted on six SFDA settings for VER. The results demonstrate the effectiveness of BBA, which achieves remarkable performance gains compared with state-of-the-art SFDA methods and outperforms representative unsupervised domain adaptation approaches.

Paper Structure

This paper contains 18 sections, 18 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of challenges for SFDA-VER tasks. (a) shows the pseudo-label confidence generated by SHOT on the VER dataset and the standard classification dataset OfficeHome. Differently colored dots indicate prediction probabilities for images in different categories. (b) shows the distributions of ResNet-101 features on the VER and OfficeHome datasets.
  • Figure 2: An overview comparison between conventional SFDA methods (a) and our method (b). In conventional methods (a), the source domain model is directly fine-tuned to align the source and target domains. This direct adaptation approach can be problematic due to significant differences between the source and target domains, potentially leading to suboptimal performance. In contrast, our Bridge then Begin Anew (BBA) approach (b) introduces a bridge model to generate more reliable pseudo-labels and stimulates the exploration of target domain-specific knowledge.
  • Figure 3: Illustration of our masking strategy for feature enhancement. The bridge model $\phi_b$, guided by the source model $\phi_s$, predicts both original and masked target domain images to compute self-labeling loss $\mathcal{L}_{sl}$ in Eq. (\ref{['eq:sl']}) and distillation loss $\mathcal{L}_{kd}$ in Eq. (\ref{['eq:kd']}), respectively.
  • Figure 4: Numerical examples to validate the effectiveness of the proposed emotion polarity loss. The addition of emotion polarity loss to IM loss $\mathcal{L}_{im}$ and SL loss $\mathcal{L}_{sl}$ allows the model to differentiate samples more finely.
  • Figure 5: Parameter analysis on EmoSet $\rightarrow$ FI (best viewed in color). (a): Analysis on $\lambda$. (b): Analysis on $\gamma$. (c): Analysis on $\delta$. (d): the accuracy during the training process. (e): Ablation study on EmoSet $\rightarrow$ FI. (f): Ablation study on FI $\rightarrow$ EmoSet.
  • ...and 1 more figures