Table of Contents
Fetching ...

M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset

Jiahui Geng, Jonathan Tonglet, Iryna Gurevych

TL;DR

M4FC addresses a critical gap in multimodal fact-checking by providing a large, real-world, multilingual, multicultural, and multitask dataset sourced from 22 organizations across 17 countries in 10 languages. It introduces two new tasks—visual claim extraction and location verification—alongside existing AFC tasks, enabling a realistic pipeline that connects intermediate outputs to verdict prediction. Baseline experiments across six tasks reveal the challenges contemporary models face in generation, multilingual understanding, and cross-modal reasoning, while showing that incorporating intermediate tasks and retrieved evidence can substantially boost verdict performance. The dataset thereby offers a robust resource for advancing real-world multimodal AFC research and evaluating cross-language generalization, with practical implications for improving misinformation countermeasures across diverse contexts.

Abstract

Existing real-world datasets for multimodal automated fact-checking have multiple limitations: they contain few instances, focus on only one or two languages and tasks, suffer from evidence leakage, or depend on external sets of news articles for sourcing true claims. To address these shortcomings, we introduce M4FC, a new real-world dataset comprising 4,982 images paired with 6,980 claims. The images, verified by professional fact-checkers from 22 organizations, represent diverse cultural and geographic contexts. Each claim is available in one or two out of ten languages. M4FC spans six multimodal fact-checking tasks: visual claim extraction, claimant intent prediction, fake detection, image contextualization, location verification, and verdict prediction. We provide baseline results for all tasks and analyze how combining intermediate tasks influence downstream verdict prediction performance. We make our dataset and code available.

M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset

TL;DR

M4FC addresses a critical gap in multimodal fact-checking by providing a large, real-world, multilingual, multicultural, and multitask dataset sourced from 22 organizations across 17 countries in 10 languages. It introduces two new tasks—visual claim extraction and location verification—alongside existing AFC tasks, enabling a realistic pipeline that connects intermediate outputs to verdict prediction. Baseline experiments across six tasks reveal the challenges contemporary models face in generation, multilingual understanding, and cross-modal reasoning, while showing that incorporating intermediate tasks and retrieved evidence can substantially boost verdict performance. The dataset thereby offers a robust resource for advancing real-world multimodal AFC research and evaluating cross-language generalization, with practical implications for improving misinformation countermeasures across diverse contexts.

Abstract

Existing real-world datasets for multimodal automated fact-checking have multiple limitations: they contain few instances, focus on only one or two languages and tasks, suffer from evidence leakage, or depend on external sets of news articles for sourcing true claims. To address these shortcomings, we introduce M4FC, a new real-world dataset comprising 4,982 images paired with 6,980 claims. The images, verified by professional fact-checkers from 22 organizations, represent diverse cultural and geographic contexts. Each claim is available in one or two out of ten languages. M4FC spans six multimodal fact-checking tasks: visual claim extraction, claimant intent prediction, fake detection, image contextualization, location verification, and verdict prediction. We provide baseline results for all tasks and analyze how combining intermediate tasks influence downstream verdict prediction performance. We make our dataset and code available.

Paper Structure

This paper contains 54 sections, 17 figures, 15 tables.

Figures (17)

  • Figure 1: An out-of-context instance from M4FC. The six AFC tasks are shown in blue and their output in gray.
  • Figure 2: Illustration of the visual claim extraction task. Blue, yellow, and gray boxes indicate visual cues, external knowledge, and the extracted claim, respectively.
  • Figure 3: Illustration of the location verification task. Satellite images are sourced from ESRI World Imagery.
  • Figure 4: Visual claim extraction (VCE), claimant intent prediction (CIP), fake detection (FD), and verdict prediction (VP) results (%). All VP results are shown in the balanced setting.
  • Figure 5: Visual claim extraction error examples with Gemini-1.5-Flash on M4FC test.
  • ...and 12 more figures