Table of Contents
Fetching ...

Can Out-of-Domain data help to Learn Domain-Specific Prompts for Multimodal Misinformation Detection?

Amartya Bhattacharya, Debarshi Brahma, Suraj Nagaje Mahadev, Anmol Asati, Vikas Verma, Soma Biswas

TL;DR

DPOD tackles multimodal misinformation detection across multiple domains by leveraging out-of-domain data. It introduces label-aware alignment with a CLIP backbone to learn generalizable image-text representations, then constructs semantic domain vectors to capture inter-domain similarities, and finally performs domain-specific prompt-tuning with learnable prompts conditioned on these vectors. The approach achieves state-of-the-art results on NewsCLIPpings and VERITE, including strong performance in unknown-domain scenarios and across datasets, while ablations confirm the value of each component. This framework enables rapid, domain-aware deployment with limited domain-specific annotations, offering practical benefits for scalable, cross-domain fake news detection.

Abstract

Spread of fake news using out-of-context images and captions has become widespread in this era of information overload. Since fake news can belong to different domains like politics, sports, etc. with their unique characteristics, inference on a test image-caption pair is contingent on how well the model has been trained on similar data. Since training individual models for each domain is not practical, we propose a novel framework termed DPOD (Domain-specific Prompt tuning using Out-of-domain data), which can exploit out-of-domain data during training to improve fake news detection of all desired domains simultaneously. First, to compute generalizable features, we modify the Vision-Language Model, CLIP to extract features that helps to align the representations of the images and corresponding captions of both the in-domain and out-of-domain data in a label-aware manner. Further, we propose a domain-specific prompt learning technique which leverages training samples of all the available domains based on the extent they can be useful to the desired domain. Extensive experiments on the large-scale NewsCLIPpings and VERITE benchmarks demonstrate that DPOD achieves state of-the-art performance for this challenging task. Code: https://github.com/scviab/DPOD.

Can Out-of-Domain data help to Learn Domain-Specific Prompts for Multimodal Misinformation Detection?

TL;DR

DPOD tackles multimodal misinformation detection across multiple domains by leveraging out-of-domain data. It introduces label-aware alignment with a CLIP backbone to learn generalizable image-text representations, then constructs semantic domain vectors to capture inter-domain similarities, and finally performs domain-specific prompt-tuning with learnable prompts conditioned on these vectors. The approach achieves state-of-the-art results on NewsCLIPpings and VERITE, including strong performance in unknown-domain scenarios and across datasets, while ablations confirm the value of each component. This framework enables rapid, domain-aware deployment with limited domain-specific annotations, offering practical benefits for scalable, cross-domain fake news detection.

Abstract

Spread of fake news using out-of-context images and captions has become widespread in this era of information overload. Since fake news can belong to different domains like politics, sports, etc. with their unique characteristics, inference on a test image-caption pair is contingent on how well the model has been trained on similar data. Since training individual models for each domain is not practical, we propose a novel framework termed DPOD (Domain-specific Prompt tuning using Out-of-domain data), which can exploit out-of-domain data during training to improve fake news detection of all desired domains simultaneously. First, to compute generalizable features, we modify the Vision-Language Model, CLIP to extract features that helps to align the representations of the images and corresponding captions of both the in-domain and out-of-domain data in a label-aware manner. Further, we propose a domain-specific prompt learning technique which leverages training samples of all the available domains based on the extent they can be useful to the desired domain. Extensive experiments on the large-scale NewsCLIPpings and VERITE benchmarks demonstrate that DPOD achieves state of-the-art performance for this challenging task. Code: https://github.com/scviab/DPOD.
Paper Structure (9 sections, 10 equations, 6 figures, 5 tables)

This paper contains 9 sections, 10 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Illustration of the different stages of the proposed DPOD approach. In Stage 1, we learn the image and text encoders via Label-Aware Alignment Loss. Then this model is used to obtain semantic domain vectors in Stage 2. This, in turn is finally used to learn domain-specific and generic prompts with out-of-domain samples to predict the veracity of the news in Stage 3.
  • Figure 2: Computation of the joint embedding of the images and text for a particular domain. These embedding are used in Stage 2 and 3 of our DPOD framework.
  • Figure 3: Similarity of the learnt domain-specific prompts of different domains. We observe that the sets Football, Sport, Sports and Healthcare Network, Healthcare Medicine have similar domain-specific prompts among themselves, but they have low similarity with each other and also with Music.
  • Figure 4: Comparison of accuracies (%) where we train the model on 90% domains and evaluate on the rest 10% unseen domains.
  • Figure 5: Qualitative results of the proposed DPOD Model. First three columns on the left with green border are examples of successful predictions, and the three on the right with pink border are examples of failure cases.
  • ...and 1 more figures