Table of Contents
Fetching ...

MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval

Delvin Ce Zhang, Suhan Cui, Zhelin Chu, Xianren Zhang, Dongwon Lee

TL;DR

This work addresses the challenge of verifying claims that hinge on both textual and visual evidence, introducing MEVER, a holistic framework that jointly retrieves multi-modal evidence, reasons across modalities for verification, and generates faithful explanations. The system combines a two-layer multimodal evidence graph with image-to-text and text-to-image reasoning, token- and evidence-level fusion for robust verification, and a Fusion-in-Decoder approach with a consistency regularizer to produce aligned explanations. To support scientific-domain investigations, the authors introduce AIChartClaim, a dataset of 1,200 claims, 300 charts with captions, and explanations derived from AI-domain papers, augmented with negations and GPT-4o-generated content. Across evidence retrieval, claim verification, and explanation generation, MEVER surpasses strong baselines, and ablation studies confirm the importance of multimodal retrieval, cross-modal fusion, and multi-evidence explanation. The work has practical impact for trustworthy, transparent AI in scientific chart reasoning and provides a foundation for future expansion to additional domains and knowledge-graph-based verification.

Abstract

Verifying the truthfulness of claims usually requires joint multi-modal reasoning over both textual and visual evidence, such as analyzing both textual caption and chart image for claim verification. In addition, to make the reasoning process transparent, a textual explanation is necessary to justify the verification result. However, most claim verification works mainly focus on the reasoning over textual evidence only or ignore the explainability, resulting in inaccurate and unconvincing verification. To address this problem, we propose a novel model that jointly achieves evidence retrieval, multi-modal claim verification, and explanation generation. For evidence retrieval, we construct a two-layer multi-modal graph for claims and evidence, where we design image-to-text and text-to-image reasoning for multi-modal retrieval. For claim verification, we propose token- and evidence-level fusion to integrate claim and evidence embeddings for multi-modal verification. For explanation generation, we introduce multi-modal Fusion-in-Decoder for explainability. Finally, since almost all the datasets are in general domain, we create a scientific dataset, AIChartClaim, in AI domain to complement claim verification community. Experiments show the strength of our model.

MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval

TL;DR

This work addresses the challenge of verifying claims that hinge on both textual and visual evidence, introducing MEVER, a holistic framework that jointly retrieves multi-modal evidence, reasons across modalities for verification, and generates faithful explanations. The system combines a two-layer multimodal evidence graph with image-to-text and text-to-image reasoning, token- and evidence-level fusion for robust verification, and a Fusion-in-Decoder approach with a consistency regularizer to produce aligned explanations. To support scientific-domain investigations, the authors introduce AIChartClaim, a dataset of 1,200 claims, 300 charts with captions, and explanations derived from AI-domain papers, augmented with negations and GPT-4o-generated content. Across evidence retrieval, claim verification, and explanation generation, MEVER surpasses strong baselines, and ablation studies confirm the importance of multimodal retrieval, cross-modal fusion, and multi-evidence explanation. The work has practical impact for trustworthy, transparent AI in scientific chart reasoning and provides a foundation for future expansion to additional domains and knowledge-graph-based verification.

Abstract

Verifying the truthfulness of claims usually requires joint multi-modal reasoning over both textual and visual evidence, such as analyzing both textual caption and chart image for claim verification. In addition, to make the reasoning process transparent, a textual explanation is necessary to justify the verification result. However, most claim verification works mainly focus on the reasoning over textual evidence only or ignore the explainability, resulting in inaccurate and unconvincing verification. To address this problem, we propose a novel model that jointly achieves evidence retrieval, multi-modal claim verification, and explanation generation. For evidence retrieval, we construct a two-layer multi-modal graph for claims and evidence, where we design image-to-text and text-to-image reasoning for multi-modal retrieval. For claim verification, we propose token- and evidence-level fusion to integrate claim and evidence embeddings for multi-modal verification. For explanation generation, we introduce multi-modal Fusion-in-Decoder for explainability. Finally, since almost all the datasets are in general domain, we create a scientific dataset, AIChartClaim, in AI domain to complement claim verification community. Experiments show the strength of our model.
Paper Structure (29 sections, 19 equations, 4 figures, 14 tables, 2 algorithms)

This paper contains 29 sections, 19 equations, 4 figures, 14 tables, 2 algorithms.

Figures (4)

  • Figure 1: Illustration of multi-modal and explainable claim verification, taken from AIChartClaim dataset.
  • Figure 2: Model architecture. (a-b) Cross-modal graph reasoning. (c) A nested architecture with multi-modal graph reasoning. (d) Multi-modal token-level fusion. (e) Multi-modal explanation generation with Fusion-in-Decoder.
  • Figure 3: Model analysis on AIChartClaim and Mocheg datasets.
  • Figure 4: Case study on AIChartClaim dataset.