Table of Contents
Fetching ...

Image-Text Out-Of-Context Detection Using Synthetic Multimodal Misinformation

Fatma Shalabi, Huy H. Nguyen, Hichem Felouat, Ching-Chun Chang, Isao Echizen

TL;DR

This paper tackles Out-Of-Context Detection (OOCD) in multimodal misinformation by introducing synthetic data generation to overcome annotated-data scarcity. It proposes a data-augmentation pipeline that creates synthetic image-caption pairs (I, C, I', C') using a captioning model (BLIP-2) and a text-to-image generator (Stable Diffusion), then fuses multimodal features from CLIP, SBERT, and ViT to train a binary matcher for context consistency. Empirical evaluation on the NewsCLIPpings dataset shows competitive performance, with a best accuracy around 68% and evidence that synthetic data can improve OOCD robustness and efficiency. The work provides a practical, scalable framework and a new dataset resource to advance robust detection of multimodal misinformation in real time.

Abstract

Misinformation has become a major challenge in the era of increasing digital information, requiring the development of effective detection methods. We have investigated a novel approach to Out-Of-Context detection (OOCD) that uses synthetic data generation. We created a dataset specifically designed for OOCD and developed an efficient detector for accurate classification. Our experimental findings validate the use of synthetic data generation and demonstrate its efficacy in addressing the data limitations associated with OOCD. The dataset and detector should serve as valuable resources for future research and the development of robust misinformation detection systems.

Image-Text Out-Of-Context Detection Using Synthetic Multimodal Misinformation

TL;DR

This paper tackles Out-Of-Context Detection (OOCD) in multimodal misinformation by introducing synthetic data generation to overcome annotated-data scarcity. It proposes a data-augmentation pipeline that creates synthetic image-caption pairs (I, C, I', C') using a captioning model (BLIP-2) and a text-to-image generator (Stable Diffusion), then fuses multimodal features from CLIP, SBERT, and ViT to train a binary matcher for context consistency. Empirical evaluation on the NewsCLIPpings dataset shows competitive performance, with a best accuracy around 68% and evidence that synthetic data can improve OOCD robustness and efficiency. The work provides a practical, scalable framework and a new dataset resource to advance robust detection of multimodal misinformation in real time.

Abstract

Misinformation has become a major challenge in the era of increasing digital information, requiring the development of effective detection methods. We have investigated a novel approach to Out-Of-Context detection (OOCD) that uses synthetic data generation. We created a dataset specifically designed for OOCD and developed an efficient detector for accurate classification. Our experimental findings validate the use of synthetic data generation and demonstrate its efficacy in addressing the data limitations associated with OOCD. The dataset and detector should serve as valuable resources for future research and the development of robust misinformation detection systems.
Paper Structure (20 sections, 1 equation, 2 figures, 5 tables)

This paper contains 20 sections, 1 equation, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Representation of steps involved in dataset preparation.
  • Figure 2: Overview of our proposed approach.