Table of Contents
Fetching ...

AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web

Rui Cao, Zifeng Ding, Zhijiang Guo, Michael Schlichtkrull, Andreas Vlachos

TL;DR

AVerImaTeC introduces a real-world image-text claim verification dataset of 1,297 claims annotated with QA-style reasoning and web-evidence evidence retrieval. It addresses key challenges in multimodal fact-checking, including contextual dependence, temporal leakage, and evidence sufficiency, via claim normalization and a two-stage sufficiency process, achieving substantial inter-annotator agreement ($\kappa = 0.742$) and QA recall ($74.7\%$). The work proposes a reference-based multimodal evidence evaluation framework and establishes baselines using open-web tools and multiple LLM/MLLM configurations, highlighting the difficulty of evidence retrieval and the importance of strong evidence for reliable verdicts. The dataset and evaluation protocol provide a foundation for future research in open-web multimodal fact-checking, with potential impact on developing transparent, verifiable image-text claim verification systems. Overall, AVerImaTeC advances beyond synthetic datasets by coupling real-world claims with decomposed reasoning paths and explicit evidence, enabling more robust evaluation of multimodal verification models.

Abstract

Textual claims are often accompanied by images to enhance their credibility and spread on social media, but this also raises concerns about the spread of misinformation. Existing datasets for automated verification of image-text claims remain limited, as they often consist of synthetic claims and lack evidence annotations to capture the reasoning behind the verdict. In this work, we introduce AVerImaTeC, a dataset consisting of 1,297 real-world image-text claims. Each claim is annotated with question-answer (QA) pairs containing evidence from the web, reflecting a decomposed reasoning regarding the verdict. We mitigate common challenges in fact-checking datasets such as contextual dependence, temporal leakage, and evidence insufficiency, via claim normalization, temporally constrained evidence annotation, and a two-stage sufficiency check. We assess the consistency of the annotation in AVerImaTeC via inter-annotator studies, achieving a $κ=0.742$ on verdicts and $74.7\%$ consistency on QA pairs. We also propose a novel evaluation method for evidence retrieval and conduct extensive experiments to establish baselines for verifying image-text claims using open-web evidence.

AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web

TL;DR

AVerImaTeC introduces a real-world image-text claim verification dataset of 1,297 claims annotated with QA-style reasoning and web-evidence evidence retrieval. It addresses key challenges in multimodal fact-checking, including contextual dependence, temporal leakage, and evidence sufficiency, via claim normalization and a two-stage sufficiency process, achieving substantial inter-annotator agreement () and QA recall (). The work proposes a reference-based multimodal evidence evaluation framework and establishes baselines using open-web tools and multiple LLM/MLLM configurations, highlighting the difficulty of evidence retrieval and the importance of strong evidence for reliable verdicts. The dataset and evaluation protocol provide a foundation for future research in open-web multimodal fact-checking, with potential impact on developing transparent, verifiable image-text claim verification systems. Overall, AVerImaTeC advances beyond synthetic datasets by coupling real-world claims with decomposed reasoning paths and explicit evidence, enabling more robust evaluation of multimodal verification models.

Abstract

Textual claims are often accompanied by images to enhance their credibility and spread on social media, but this also raises concerns about the spread of misinformation. Existing datasets for automated verification of image-text claims remain limited, as they often consist of synthetic claims and lack evidence annotations to capture the reasoning behind the verdict. In this work, we introduce AVerImaTeC, a dataset consisting of 1,297 real-world image-text claims. Each claim is annotated with question-answer (QA) pairs containing evidence from the web, reflecting a decomposed reasoning regarding the verdict. We mitigate common challenges in fact-checking datasets such as contextual dependence, temporal leakage, and evidence insufficiency, via claim normalization, temporally constrained evidence annotation, and a two-stage sufficiency check. We assess the consistency of the annotation in AVerImaTeC via inter-annotator studies, achieving a on verdicts and consistency on QA pairs. We also propose a novel evaluation method for evidence retrieval and conduct extensive experiments to establish baselines for verifying image-text claims using open-web evidence.

Paper Structure

This paper contains 39 sections, 9 figures, 13 tables.

Figures (9)

  • Figure 1: An annotated claim from AVerImaTeC. The rationale for verifying an image-text claim has been decomposed into a sequence of QA pairs, which could be potentially multimodal.
  • Figure 2: Annotation pipeline. We first normalize the claim, then perform QA annotation to structure evidence retrieval. Two rounds of evidence sufficiency checks ensure annotation quality.
  • Figure 3: Platform and instructions for validating annotators' agreement on QA annotations.
  • Figure 4: The prompt in use for converting QA pairs to evidence statement.
  • Figure 5: The evaluation prompt for generated questions.
  • ...and 4 more figures