MMM-Fact: A Multimodal, Multi-Domain Fact-Checking Dataset with Multi-Level Retrieval Difficulty

Wenyan Xu; Dawei Xiang; Tianqi Ding; Weihai Lu

MMM-Fact: A Multimodal, Multi-Domain Fact-Checking Dataset with Multi-Level Retrieval Difficulty

Wenyan Xu, Dawei Xiang, Tianqi Ding, Weihai Lu

TL;DR

MMM-Fact addresses the need for realistic fact-checking benchmarks by pairing 125,449 statements (1995–2025) with full articles and multimodal evidence from multiple sources. It introduces retrieval-difficulty tiers (Basic/Intermediate/Advanced) and a three-class veracity scheme to enable fair, curriculum-style evaluation of multi-step, cross-modal reasoning. The dataset spans diverse domains and preserves auditable evidence chains for end-to-end workflows and longitudinal analyses. Baselines with mainstream LLMs show MMM-Fact is substantially harder than prior benchmarks, with performance dropping as evidence complexity increases, underscoring the importance of cross-modal alignment and reasoning strategies.

Abstract

Misinformation and disinformation demand fact checking that goes beyond simple evidence-based reasoning. Existing benchmarks fall short: they are largely single modality (text-only), span short time horizons, use shallow evidence, cover domains unevenly, and often omit full articles -- obscuring models' real-world capability. We present MMM-Fact, a large-scale benchmark of 125,449 fact-checked statements (1995--2025) across multiple domains, each paired with the full fact-check article and multimodal evidence (text, images, videos, tables) from four fact-checking sites and one news outlet. To reflect verification effort, each statement is tagged with a retrieval-difficulty tier -- Basic (1--5 sources), Intermediate (6--10), and Advanced (>10) -- supporting fairness-aware evaluation for multi-step, cross-modal reasoning. The dataset adopts a three-class veracity scheme (true/false/not enough information) and enables tasks in veracity prediction, explainable fact-checking, complex evidence aggregation, and longitudinal analysis. Baselines with mainstream LLMs show MMM-Fact is markedly harder than prior resources, with performance degrading as evidence complexity rises. MMM-Fact offers a realistic, scalable benchmark for transparent, reliable, multimodal fact-checking.

MMM-Fact: A Multimodal, Multi-Domain Fact-Checking Dataset with Multi-Level Retrieval Difficulty

TL;DR

Abstract

MMM-Fact: A Multimodal, Multi-Domain Fact-Checking Dataset with Multi-Level Retrieval Difficulty

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)