Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

Fengzhu Zeng; Wenqian Li; Wei Gao; Yan Pang

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

Fengzhu Zeng, Wenqian Li, Wei Gao, Yan Pang

TL;DR

Experiments show that the proposed method enhances the performance of a small MLLM on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

Abstract

Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distribution gap. To address this, we propose learning from synthetic data for detecting real-world multimodal misinformation through two model-agnostic data selection methods that match synthetic and real-world data distributions. Experiments show that our method enhances the performance of a small MLLM (13B) on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

TL;DR

Experiments show that the proposed method enhances the performance of a small MLLM on real-world fact-checking datasets, enabling it to even surpass GPT-4V~\cite{GPT-4V}.

Abstract

Paper Structure (39 sections, 3 equations, 6 figures, 2 tables)

This paper contains 39 sections, 3 equations, 6 figures, 2 tables.

Introduction
Problem Formulation
Multimodal Misinformation Data
Common Categories
Training and Evaluation Datasets
Methodology
Feature Extraction.
Semantic Similarity (SemSim).
Distributional Similarity (DisSim).
Experimental Evaluation
Experimental Settings
Datasets.
Base MLLMs.
Baselines.
Default Settings.
...and 24 more sections

Figures (6)

Figure 1: Elon Musk holding a flag that says "Trump Won, Democrats Cheated."
Figure 2: The 2D projection of the multimodal features using PCA. (a) MediaEval and Snopes datasets; (b) Top-10 events with the most instances in MediaEval. Each colored group represents an event, and three highlighted groups are top-3 events; (c) The F1 score of semantic and distributional selection methods on top-3 events with the most instances from MediaEval dataset. (.) encloses standard deviation. PA: Paris Attack; MFL: Mt Fuji Lenticular; GP: German Protest.
Figure 3: The F1 score with increasing the number of selected synthetic instances for training.
Figure 4: Visualization of real-world instances of 2015 Paris Attack event.
Figure 5: Visualization of synthetic instances with high similarity to real-world data.
...and 1 more figures

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

TL;DR

Abstract

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (6)