Table of Contents
Fetching ...

MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking

Ting-Chih Chen, Chia-Wei Tang, Chris Thomas

TL;DR

The paper introduces MetaSumPerceiver (MSP), a multimodal, multi-document summarization framework designed to support fact-checking by producing concise, evidence-rich summaries from claims, documents, and images. MSP leverages a Perceiver-based architecture to handle arbitrary input lengths and modalities, and is trained with reinforcement learning using an entailment-based reward and KL regularization to generate summaries that facilitate truth assessment. A new Multi-News-Fact-Checking dataset is released, alongside extensive experiments on MOCHEG and this dataset showing state-of-the-art claim verification and explanation-generation performance, with ablations confirming the value of cross-modal evidence and critic-based guidance. The work demonstrates a promising direction for streamlining real-world fact-checking by producing targeted, verifiable summaries across heterogeneous sources.

Abstract

Fact-checking real-world claims often requires reviewing multiple multimodal documents to assess a claim's truthfulness, which is a highly laborious and time-consuming task. In this paper, we present a summarization model designed to generate claim-specific summaries useful for fact-checking from multimodal, multi-document datasets. The model takes inputs in the form of documents, images, and a claim, with the objective of assisting in fact-checking tasks. We introduce a dynamic perceiver-based model that can handle inputs from multiple modalities of arbitrary lengths. To train our model, we leverage a novel reinforcement learning-based entailment objective to generate summaries that provide evidence distinguishing between different truthfulness labels. To assess the efficacy of our approach, we conduct experiments on both an existing benchmark and a new dataset of multi-document claims that we contribute. Our approach outperforms the SOTA approach by 4.6% in the claim verification task on the MOCHEG dataset and demonstrates strong performance on our new Multi-News-Fact-Checking dataset.

MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking

TL;DR

The paper introduces MetaSumPerceiver (MSP), a multimodal, multi-document summarization framework designed to support fact-checking by producing concise, evidence-rich summaries from claims, documents, and images. MSP leverages a Perceiver-based architecture to handle arbitrary input lengths and modalities, and is trained with reinforcement learning using an entailment-based reward and KL regularization to generate summaries that facilitate truth assessment. A new Multi-News-Fact-Checking dataset is released, alongside extensive experiments on MOCHEG and this dataset showing state-of-the-art claim verification and explanation-generation performance, with ablations confirming the value of cross-modal evidence and critic-based guidance. The work demonstrates a promising direction for streamlining real-world fact-checking by producing targeted, verifiable summaries across heterogeneous sources.

Abstract

Fact-checking real-world claims often requires reviewing multiple multimodal documents to assess a claim's truthfulness, which is a highly laborious and time-consuming task. In this paper, we present a summarization model designed to generate claim-specific summaries useful for fact-checking from multimodal, multi-document datasets. The model takes inputs in the form of documents, images, and a claim, with the objective of assisting in fact-checking tasks. We introduce a dynamic perceiver-based model that can handle inputs from multiple modalities of arbitrary lengths. To train our model, we leverage a novel reinforcement learning-based entailment objective to generate summaries that provide evidence distinguishing between different truthfulness labels. To assess the efficacy of our approach, we conduct experiments on both an existing benchmark and a new dataset of multi-document claims that we contribute. Our approach outperforms the SOTA approach by 4.6% in the claim verification task on the MOCHEG dataset and demonstrates strong performance on our new Multi-News-Fact-Checking dataset.
Paper Structure (22 sections, 6 equations, 7 figures, 7 tables)

This paper contains 22 sections, 6 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Overview of MetaSumPerceiver (MSP): Using inputs such as documents, images, and claims, MSP generates summaries to facilitate fact-checking. In this example, the summary provides evidence and establishes that the claim in question is entailed by the evidence.
  • Figure 2: Overview of MetaSumPerceiver (MSP): This figure illustrates the process of generating a summary for fact-checking using MSP, integrating a fixed entailment model for accurate truthfulness labeling. Furthermore, it highlights how PPO is employed to continually refine the summary during the fact-checking process.
  • Figure 3: The Proximal Policy Optimization (PPO) process starts with the summarizer generating a response based on the input query. The reward model then assesses this query-response pair, producing a single reward score. Simultaneously, the process calculates the KL-divergence by comparing the likelihood of token sequences in the response with both the currently fine-tuned active model and a pre-trained reference model. The KL-divergence acts as a measure of reward, ensuring that responses from the active model align with those from the reference model. Additionally, we input the summary into Mistral LM to evaluate whether the summary is concise or not. In conclusion, PPO updates the parameters of the active model based on the reward model's output, Mistral LM, and the value of the KL-divergence.
  • Figure 4: The normal summary is produced by our initial MSP model, while the concise and clear summary is generated using MSP trained with the $r_{quality}$ reward.
  • Figure 5: Explanation generation examples of Multimodal Fact-Checking. The Truthfulness column shows gold labels.
  • ...and 2 more figures