Table of Contents
Fetching ...

Learning Domain-Invariant Features for Out-of-Context News Detection

Yimeng Gu, Mengqi Zhang, Ignacio Castro, Shu Wu, Gareth Tyson

TL;DR

This work tackles out-of-context news detection under domain shift by introducing ConDA-TTA, a framework that blends contrastive domain adaptation with maximum mean discrepancy and test-time adaptation. A BLIP-2 based multimodal encoder maps image-text pairs into a shared representation, which is regularized toward domain-invariant structure via a projection head, contrastive losses, and MMD, while BatchNorm statistics are updated during testing to align with the target domain. The authors demonstrate that ConDA-TTA outperforms numerous baselines on Twitter-COMMs and NewsCLIPpings, with improvements that are particularly notable when domain gaps are larger, and provide ablations and visualizations to support the claimed domain-invariant effect. The approach lowers annotation costs and enhances robustness to unseen topics or agencies, offering practical value for scalable misinformation detection across diverse sources.

Abstract

Out-of-context news is a common type of misinformation on online media platforms. This involves posting a caption, alongside a mismatched news image. Existing out-of-context news detection models only consider the scenario where pre-labeled data is available for each domain, failing to address the out-of-context news detection on unlabeled domains (e.g. news topics or agencies). In this work, we therefore focus on domain adaptive out-of-context news detection. In order to effectively adapt the detection model to unlabeled news topics or agencies, we propose ConDA-TTA (Contrastive Domain Adaptation with Test-Time Adaptation) which applies contrastive learning and maximum mean discrepancy (MMD) to learn domain-invariant features. In addition, we leverage test-time target domain statistics to further assist domain adaptation. Experimental results show that our approach outperforms baselines in most domain adaptation settings on two public datasets, by as much as 2.93% in F1 and 2.08% in accuracy.

Learning Domain-Invariant Features for Out-of-Context News Detection

TL;DR

This work tackles out-of-context news detection under domain shift by introducing ConDA-TTA, a framework that blends contrastive domain adaptation with maximum mean discrepancy and test-time adaptation. A BLIP-2 based multimodal encoder maps image-text pairs into a shared representation, which is regularized toward domain-invariant structure via a projection head, contrastive losses, and MMD, while BatchNorm statistics are updated during testing to align with the target domain. The authors demonstrate that ConDA-TTA outperforms numerous baselines on Twitter-COMMs and NewsCLIPpings, with improvements that are particularly notable when domain gaps are larger, and provide ablations and visualizations to support the claimed domain-invariant effect. The approach lowers annotation costs and enhances robustness to unseen topics or agencies, offering practical value for scalable misinformation detection across diverse sources.

Abstract

Out-of-context news is a common type of misinformation on online media platforms. This involves posting a caption, alongside a mismatched news image. Existing out-of-context news detection models only consider the scenario where pre-labeled data is available for each domain, failing to address the out-of-context news detection on unlabeled domains (e.g. news topics or agencies). In this work, we therefore focus on domain adaptive out-of-context news detection. In order to effectively adapt the detection model to unlabeled news topics or agencies, we propose ConDA-TTA (Contrastive Domain Adaptation with Test-Time Adaptation) which applies contrastive learning and maximum mean discrepancy (MMD) to learn domain-invariant features. In addition, we leverage test-time target domain statistics to further assist domain adaptation. Experimental results show that our approach outperforms baselines in most domain adaptation settings on two public datasets, by as much as 2.93% in F1 and 2.08% in accuracy.
Paper Structure (22 sections, 11 equations, 5 figures, 5 tables)

This paper contains 22 sections, 11 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Examples of out-of-context news of three different news topics from the Twitter-COMMs dataset.
  • Figure 2: The model architecture of ConDA-TTA. We first use the ( i) Multimodal Feature Encoder to encode the news and its augmentation into multimodal representations. Then in the ( ii) Contrastive Domain Adpatation, we apply contrastive learning and maximum mean discrepancy (MMD) to learn the domain-invariant feature. Finally, we adopt the ( iii) Test-Time Adaptation to update statistic-related model parameters in the evaluation phase.
  • Figure 3: TSNE visualization of the multimodal feature $\textbf{x}$ and the learned domain-invariant feature $\textbf{z}$ under Cv, Cl $\rightarrow$ M.
  • Figure 4: ConDA-TTA's performances (in Acc) with different $\lambda_{MMD}$ and $\lambda_{ctr}$ values. The legend shows the target domain.
  • Figure 5: ConDA-TTA's performances (in Acc) with different batch sizes. The legend shows the target domain.