Table of Contents
Fetching ...

Out-of-Context Misinformation Detection via Variational Domain-Invariant Learning with Test-Time Training

Xi Yang, Han Zhang, Zhijian Lin, Yibiao Hu, Hong Han

TL;DR

This work tackles out-of-context misinformation detection under cross-domain distribution shifts by introducing VDT, a framework that learns domain-invariant representations through a shared variational alignment (DIVA) module and counteracts semantic collapse with a domain consistency constraint. It integrates test-time training (TTT) and a confidence-variance filtering mechanism to adapt to unseen target domains using unlabeled data, updating both the VAE encoder and the classifier. Evaluations on NewsCLIPpings show that VDT achieves competitive or superior performance relative to state-of-the-art baselines across multiple domain transfer settings, with ablation studies highlighting the importance of the DIVA and consistency losses and the CVF strategy. The approach demonstrates practical potential for robust, scalable OOC misinformation detection in real-world, multilingual news ecosystems where domain drift is common.

Abstract

Out-of-context misinformation (OOC) is a low-cost form of misinformation in news reports, which refers to place authentic images into out-of-context or fabricated image-text pairings. This problem has attracted significant attention from researchers in recent years. Current methods focus on assessing image-text consistency or generating explanations. However, these approaches assume that the training and test data are drawn from the same distribution. When encountering novel news domains, models tend to perform poorly due to the lack of prior knowledge. To address this challenge, we propose \textbf{VDT} to enhance the domain adaptation capability for OOC misinformation detection by learning domain-invariant features and test-time training mechanisms. Domain-Invariant Variational Align module is employed to jointly encodes source and target domain data to learn a separable distributional space domain-invariant features. For preserving semantic integrity, we utilize domain consistency constraint module to reconstruct the source and target domain latent distribution. During testing phase, we adopt the test-time training strategy and confidence-variance filtering module to dynamically updating the VAE encoder and classifier, facilitating the model's adaptation to the target domain distribution. Extensive experiments conducted on the benchmark dataset NewsCLIPpings demonstrate that our method outperforms state-of-the-art baselines under most domain adaptation settings.

Out-of-Context Misinformation Detection via Variational Domain-Invariant Learning with Test-Time Training

TL;DR

This work tackles out-of-context misinformation detection under cross-domain distribution shifts by introducing VDT, a framework that learns domain-invariant representations through a shared variational alignment (DIVA) module and counteracts semantic collapse with a domain consistency constraint. It integrates test-time training (TTT) and a confidence-variance filtering mechanism to adapt to unseen target domains using unlabeled data, updating both the VAE encoder and the classifier. Evaluations on NewsCLIPpings show that VDT achieves competitive or superior performance relative to state-of-the-art baselines across multiple domain transfer settings, with ablation studies highlighting the importance of the DIVA and consistency losses and the CVF strategy. The approach demonstrates practical potential for robust, scalable OOC misinformation detection in real-world, multilingual news ecosystems where domain drift is common.

Abstract

Out-of-context misinformation (OOC) is a low-cost form of misinformation in news reports, which refers to place authentic images into out-of-context or fabricated image-text pairings. This problem has attracted significant attention from researchers in recent years. Current methods focus on assessing image-text consistency or generating explanations. However, these approaches assume that the training and test data are drawn from the same distribution. When encountering novel news domains, models tend to perform poorly due to the lack of prior knowledge. To address this challenge, we propose \textbf{VDT} to enhance the domain adaptation capability for OOC misinformation detection by learning domain-invariant features and test-time training mechanisms. Domain-Invariant Variational Align module is employed to jointly encodes source and target domain data to learn a separable distributional space domain-invariant features. For preserving semantic integrity, we utilize domain consistency constraint module to reconstruct the source and target domain latent distribution. During testing phase, we adopt the test-time training strategy and confidence-variance filtering module to dynamically updating the VAE encoder and classifier, facilitating the model's adaptation to the target domain distribution. Extensive experiments conducted on the benchmark dataset NewsCLIPpings demonstrate that our method outperforms state-of-the-art baselines under most domain adaptation settings.

Paper Structure

This paper contains 29 sections, 18 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The illustration of domain adaptation challenge in out-of-context misinformation detection. Model failure on target domain due to the significant distribution gap between the source and target domains. By domain adaptation to learn domain-invariant features, the model adapted to target domain better.
  • Figure 2: The illustration of proposed VDT model.
  • Figure 3: VDT’s performances with different $\beta$ (sub-figure (a) and (b)) and threshold $\theta$ (sub-figure (c) and (d)) values. The legend illustrates target domains.
  • Figure 4: t-SNE visualization of the multimodal feature $X$ and the learned domain-invariant feature $F$.
  • Figure 5: Statistical Comparison of performance with statistical significance annotation. "ModalA" means our proposed method VDT, "ModelB" refer the ConDA-TTT. And "*" denotes $p<0.05$, "**" denotes $p<0.01$, "***" present $p<0.001$, and "ns" for not significant.
  • ...and 1 more figures