Image-Text Out-Of-Context Detection Using Synthetic Multimodal Misinformation
Fatma Shalabi, Huy H. Nguyen, Hichem Felouat, Ching-Chun Chang, Isao Echizen
TL;DR
This paper tackles Out-Of-Context Detection (OOCD) in multimodal misinformation by introducing synthetic data generation to overcome annotated-data scarcity. It proposes a data-augmentation pipeline that creates synthetic image-caption pairs (I, C, I', C') using a captioning model (BLIP-2) and a text-to-image generator (Stable Diffusion), then fuses multimodal features from CLIP, SBERT, and ViT to train a binary matcher for context consistency. Empirical evaluation on the NewsCLIPpings dataset shows competitive performance, with a best accuracy around 68% and evidence that synthetic data can improve OOCD robustness and efficiency. The work provides a practical, scalable framework and a new dataset resource to advance robust detection of multimodal misinformation in real time.
Abstract
Misinformation has become a major challenge in the era of increasing digital information, requiring the development of effective detection methods. We have investigated a novel approach to Out-Of-Context detection (OOCD) that uses synthetic data generation. We created a dataset specifically designed for OOCD and developed an efficient detector for accurate classification. Our experimental findings validate the use of synthetic data generation and demonstrate its efficacy in addressing the data limitations associated with OOCD. The dataset and detector should serve as valuable resources for future research and the development of robust misinformation detection systems.
