Table of Contents
Fetching ...

Bridging the Gap for Test-Time Multimodal Sentiment Analysis

Zirun Guo, Tao Jin, Wenlong Xu, Wang Lin, Yangyang Wu

TL;DR

Two strategies are proposed: Contrastive Adaptation and Stable Pseudo-label generation (CASP) for test-time adaptation for multimodal sentiment analysis and deal with the distribution shifts for MSA by enforcing consistency and minimizing empirical risk, respectively.

Abstract

Multimodal sentiment analysis (MSA) is an emerging research topic that aims to understand and recognize human sentiment or emotions through multiple modalities. However, in real-world dynamic scenarios, the distribution of target data is always changing and different from the source data used to train the model, which leads to performance degradation. Common adaptation methods usually need source data, which could pose privacy issues or storage overheads. Therefore, test-time adaptation (TTA) methods are introduced to improve the performance of the model at inference time. Existing TTA methods are always based on probabilistic models and unimodal learning, and thus can not be applied to MSA which is often considered as a multimodal regression task. In this paper, we propose two strategies: Contrastive Adaptation and Stable Pseudo-label generation (CASP) for test-time adaptation for multimodal sentiment analysis. The two strategies deal with the distribution shifts for MSA by enforcing consistency and minimizing empirical risk, respectively. Extensive experiments show that CASP brings significant and consistent improvements to the performance of the model across various distribution shift settings and with different backbones, demonstrating its effectiveness and versatility. Our codes are available at https://github.com/zrguo/CASP.

Bridging the Gap for Test-Time Multimodal Sentiment Analysis

TL;DR

Two strategies are proposed: Contrastive Adaptation and Stable Pseudo-label generation (CASP) for test-time adaptation for multimodal sentiment analysis and deal with the distribution shifts for MSA by enforcing consistency and minimizing empirical risk, respectively.

Abstract

Multimodal sentiment analysis (MSA) is an emerging research topic that aims to understand and recognize human sentiment or emotions through multiple modalities. However, in real-world dynamic scenarios, the distribution of target data is always changing and different from the source data used to train the model, which leads to performance degradation. Common adaptation methods usually need source data, which could pose privacy issues or storage overheads. Therefore, test-time adaptation (TTA) methods are introduced to improve the performance of the model at inference time. Existing TTA methods are always based on probabilistic models and unimodal learning, and thus can not be applied to MSA which is often considered as a multimodal regression task. In this paper, we propose two strategies: Contrastive Adaptation and Stable Pseudo-label generation (CASP) for test-time adaptation for multimodal sentiment analysis. The two strategies deal with the distribution shifts for MSA by enforcing consistency and minimizing empirical risk, respectively. Extensive experiments show that CASP brings significant and consistent improvements to the performance of the model across various distribution shift settings and with different backbones, demonstrating its effectiveness and versatility. Our codes are available at https://github.com/zrguo/CASP.

Paper Structure

This paper contains 14 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Test-time adaptation for multimodal sentiment analysis. The source domain data is used for source model training and is unavailable during the adaptation process. The target domain data is unlabeled.
  • Figure 2: The overall framework of CASP. The adaptation process of CASP has two stages. Stage 1: contrastive adaptation to enforce consistency via modality random dropout. Stage 2: utilizing the checkpoints generated in Stage 1 to select high-confident pseudo labels for self-training. The two stages address the distribution shifts by consistency regularization and empirical risk minimization respectively.
  • Figure 3: The overview of contrastive adaptation strategy. We randomly drop a modality to generate new data. Then we enforce the representations of the original data and the new data closer and distance the representation of the original data from the other representations in the batch.
  • Figure 4: The distribution of stability $s$ on MOSEI$\rightarrow$SIMS.
  • Figure 5: Effectiveness of the stable pseudo-label generation strategy across five different distribution shift settings and with two different backbones. $T_1,T_2,T_3,T_4$ and $T_5$ have the same meaning as Table \ref{['ab1']}.