Context Shapes LLMs Retrieval-Augmented Fact-Checking Effectiveness

Pietro Bernardelle; Stefano Civelli; Kevin Roitero; Gianluca Demartini

Context Shapes LLMs Retrieval-Augmented Fact-Checking Effectiveness

Pietro Bernardelle, Stefano Civelli, Kevin Roitero, Gianluca Demartini

TL;DR

It is found that LLMs exhibit non-trivial parametric knowledge of factual claims and that their verification accuracy generally declines as context length increases, which underscores the importance of prompt structure in retrieval-augmented fact-checking systems.

Abstract

Large language models (LLMs) show strong reasoning abilities across diverse tasks, yet their performance on extended contexts remains inconsistent. While prior research has emphasized mid-context degradation in question answering, this study examines the impact of context in LLM-based fact verification. Using three datasets (HOVER, FEVEROUS, and ClimateFEVER) and five open-source models accross different parameters sizes (7B, 32B and 70B parameters) and model families (Llama-3.1, Qwen2.5 and Qwen3), we evaluate both parametric factual knowledge and the impact of evidence placement across varying context lengths. We find that LLMs exhibit non-trivial parametric knowledge of factual claims and that their verification accuracy generally declines as context length increases. Similarly to what has been shown in previous works, in-context evidence placement plays a critical role with accuracy being consistently higher when relevant evidence appears near the beginning or end of the prompt and lower when placed mid-context. These results underscore the importance of prompt structure in retrieval-augmented fact-checking systems.

Context Shapes LLMs Retrieval-Augmented Fact-Checking Effectiveness

TL;DR

Abstract

Paper Structure (13 sections, 2 figures, 2 tables)

This paper contains 13 sections, 2 figures, 2 tables.

Introduction
Background and Related Work
Automated Fact-Checking
Context Processing in LLMs.
Methodology
Datasets
Language Models
Experimental Setting
Results
Parametric Knowledge and Evidence Gains
The Effect of Context Length
Evidence Placement Effects
Conclusion and Future Work

Figures (2)

Figure 1: Accuracy as a function of context length averaged over evidence-placement runs.
Figure 2: Accuracy vs. evidence depth (0--100%) for the five investigated LLMs across four context lengths (2k/4k/8k/16k). Each panel shows how verification accuracy changes as the evidence block is moved through the prompt while total input length is held constant using filler text; horizontal baselines mark parametric-only (claim only) and claim+evidence without filler.

Context Shapes LLMs Retrieval-Augmented Fact-Checking Effectiveness

TL;DR

Abstract

Context Shapes LLMs Retrieval-Augmented Fact-Checking Effectiveness

Authors

TL;DR

Abstract

Table of Contents

Figures (2)