Table of Contents
Fetching ...

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

Fengzhu Zeng, Wenqian Li, Wei Gao, Yan Pang

TL;DR

Experiments show that the proposed method enhances the performance of a small MLLM on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

Abstract

Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

TL;DR

Experiments show that the proposed method enhances the performance of a small MLLM on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

Abstract

Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.
Paper Structure (39 sections, 3 equations, 6 figures, 2 tables)

This paper contains 39 sections, 3 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Elon Musk holding a flag that says "Trump Won, Democrats Cheated."
  • Figure 2: The 2D projection of the multimodal features using PCA. (a) MediaEval and Snopes datasets; (b) Top-10 events with the most instances in MediaEval. Each colored group represents an event, and three highlighted groups are top-3 events; (c) The F1 score of semantic and distributional selection methods on top-3 events with the most instances from MediaEval dataset. (.) encloses standard deviation. PA: Paris Attack; MFL: Mt Fuji Lenticular; GP: German Protest.
  • Figure 3: The F1 score with increasing the number of selected synthetic instances for training.
  • Figure 4: Visualization of real-world instances of 2015 Paris Attack event.
  • Figure 5: Visualization of synthetic instances with high similarity to real-world data.
  • ...and 1 more figures