Table of Contents
Fetching ...

Context Shapes LLMs Retrieval-Augmented Fact-Checking Effectiveness

Pietro Bernardelle, Stefano Civelli, Kevin Roitero, Gianluca Demartini

TL;DR

It is found that LLMs exhibit non-trivial parametric knowledge of factual claims and that their verification accuracy generally declines as context length increases, which underscores the importance of prompt structure in retrieval-augmented fact-checking systems.

Abstract

Large language models (LLMs) show strong reasoning abilities across diverse tasks, yet their performance on extended contexts remains inconsistent. While prior research has emphasized mid-context degradation in question answering, this study examines the impact of context in LLM-based fact verification. Using three datasets (HOVER, FEVEROUS, and ClimateFEVER) and five open-source models accross different parameters sizes (7B, 32B and 70B parameters) and model families (Llama-3.1, Qwen2.5 and Qwen3), we evaluate both parametric factual knowledge and the impact of evidence placement across varying context lengths. We find that LLMs exhibit non-trivial parametric knowledge of factual claims and that their verification accuracy generally declines as context length increases. Similarly to what has been shown in previous works, in-context evidence placement plays a critical role with accuracy being consistently higher when relevant evidence appears near the beginning or end of the prompt and lower when placed mid-context. These results underscore the importance of prompt structure in retrieval-augmented fact-checking systems.

Context Shapes LLMs Retrieval-Augmented Fact-Checking Effectiveness

TL;DR

It is found that LLMs exhibit non-trivial parametric knowledge of factual claims and that their verification accuracy generally declines as context length increases, which underscores the importance of prompt structure in retrieval-augmented fact-checking systems.

Abstract

Large language models (LLMs) show strong reasoning abilities across diverse tasks, yet their performance on extended contexts remains inconsistent. While prior research has emphasized mid-context degradation in question answering, this study examines the impact of context in LLM-based fact verification. Using three datasets (HOVER, FEVEROUS, and ClimateFEVER) and five open-source models accross different parameters sizes (7B, 32B and 70B parameters) and model families (Llama-3.1, Qwen2.5 and Qwen3), we evaluate both parametric factual knowledge and the impact of evidence placement across varying context lengths. We find that LLMs exhibit non-trivial parametric knowledge of factual claims and that their verification accuracy generally declines as context length increases. Similarly to what has been shown in previous works, in-context evidence placement plays a critical role with accuracy being consistently higher when relevant evidence appears near the beginning or end of the prompt and lower when placed mid-context. These results underscore the importance of prompt structure in retrieval-augmented fact-checking systems.
Paper Structure (13 sections, 2 figures, 2 tables)

This paper contains 13 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Accuracy as a function of context length averaged over evidence-placement runs.
  • Figure 2: Accuracy vs. evidence depth (0--100%) for the five investigated LLMs across four context lengths (2k/4k/8k/16k). Each panel shows how verification accuracy changes as the evidence block is moved through the prompt while total input length is held constant using filler text; horizontal baselines mark parametric-only (claim only) and claim+evidence without filler.