Table of Contents
Fetching ...

Give Me More Details: Improving Fact-Checking with Latent Retrieval

Xuming Hu, Junzhe Chen, Zhijiang Guo, Philip S. Yu

TL;DR

This work tackles real-world fact-checking by moving beyond reliance on search snippets to leverage full-text source documents as evidence. It introduces XFact (multilingual) and EFact (English) datasets and a latent-variable SCALE model that jointly extracts evidence and verifies claims. Experiments show that document-rich evidence provides substantial contextual clues, improving accuracy and robustness across domains and languages, especially with joint training. The approach highlights practical gains for evidence-based fact-checking while acknowledging limitations and pointing to future directions such as non-text evidence and source trust considerations.

Abstract

Evidence plays a crucial role in automated fact-checking. When verifying real-world claims, existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine. Such methods ignore the challenges of collecting evidence and may not provide sufficient information to verify real-world claims. Aiming at building a better fact-checking system, we propose to incorporate full text from source documents as evidence and introduce two enriched datasets. The first one is a multilingual dataset, while the second one is monolingual (English). We further develop a latent variable model to jointly extract evidence sentences from documents and perform claim verification. Experiments indicate that including source documents can provide sufficient contextual clues even when gold evidence sentences are not annotated. The proposed system is able to achieve significant improvements upon best-reported models under different settings.

Give Me More Details: Improving Fact-Checking with Latent Retrieval

TL;DR

This work tackles real-world fact-checking by moving beyond reliance on search snippets to leverage full-text source documents as evidence. It introduces XFact (multilingual) and EFact (English) datasets and a latent-variable SCALE model that jointly extracts evidence and verifies claims. Experiments show that document-rich evidence provides substantial contextual clues, improving accuracy and robustness across domains and languages, especially with joint training. The approach highlights practical gains for evidence-based fact-checking while acknowledging limitations and pointing to future directions such as non-text evidence and source trust considerations.

Abstract

Evidence plays a crucial role in automated fact-checking. When verifying real-world claims, existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine. Such methods ignore the challenges of collecting evidence and may not provide sufficient information to verify real-world claims. Aiming at building a better fact-checking system, we propose to incorporate full text from source documents as evidence and introduce two enriched datasets. The first one is a multilingual dataset, while the second one is monolingual (English). We further develop a latent variable model to jointly extract evidence sentences from documents and perform claim verification. Experiments indicate that including source documents can provide sufficient contextual clues even when gold evidence sentences are not annotated. The proposed system is able to achieve significant improvements upon best-reported models under different settings.
Paper Structure (35 sections, 7 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 35 sections, 7 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: An example claim from the XFact dataset. The example is translated into English for illustration. Search snippet 1 is generated automatically by the search engine, which is a short summary of source document 1. One will predict the claim to be true only based on the search snippets, but the claim is false if the document is provided.
  • Figure 2: Comparison of information sufficiency, redundancy, and prediction accuracy when humans are given search snippets and source documents.
  • Figure 3: Overview of the model.
  • Figure 4: Factor graph for the evidence extractor.
  • Figure 5: Effects of Factors (BUDGET and PAIR). BUDGET is imposed to control the sparsity of the sentence selection. $K$ is the hyper-parameter to control it. PAIR is imposed to encourage contiguity.
  • ...and 2 more figures