Table of Contents
Fetching ...

Correctness is not Faithfulness in RAG Attributions

Jonas Wallat, Maria Heuss, Maarten de Rijke, Avishek Anand

TL;DR

The paper argues that citation correctness alone is insufficient for trustworthy RAG attributions and introduces citation faithfulness as a causal requirement linking cited documents to model-generated statements. It formalizes the notion, proposes desiderata for good attributions, and demonstrates through a RTG-based experiment with adversarial insertions that post-rationalization can lead to unfaithful citations in grounded generation. By designing tests and outlining evaluation strategies (data contamination, model probing, counterfactual setups), the authors highlight the need for causal verification of evidence use in answers, with implications for high-stakes information retrieval and user trust. The work emphasizes human-verifiable attribution and calls for broader empirical validation across models and domains to establish reliable, faithful grounding in LLM-based QA systems.

Abstract

Retrieving relevant context is a common approach to reduce hallucinations and enhance answer reliability. Explicitly citing source documents allows users to verify generated responses and increases trust. Prior work largely evaluates citation correctness - whether cited documents support the corresponding statements. But citation correctness alone is insufficient. To establish trust in attributed answers, we must examine both citation correctness and citation faithfulness. In this work, we first disentangle the notions of citation correctness and faithfulness, which have been applied inconsistently in previous studies. Faithfulness ensures that the model's reliance on cited documents is genuine, reflecting actual reference use rather than superficial alignment with prior beliefs, which we call post-rationalization. We design an experiment that reveals the prevalent issue of post-rationalization, which undermines reliable attribution and may result in misplaced trust. Our findings suggest that current attributed answers often lack citation faithfulness (up to 57 percent of the citations), highlighting the need to evaluate correctness and faithfulness for trustworthy attribution in language models.

Correctness is not Faithfulness in RAG Attributions

TL;DR

The paper argues that citation correctness alone is insufficient for trustworthy RAG attributions and introduces citation faithfulness as a causal requirement linking cited documents to model-generated statements. It formalizes the notion, proposes desiderata for good attributions, and demonstrates through a RTG-based experiment with adversarial insertions that post-rationalization can lead to unfaithful citations in grounded generation. By designing tests and outlining evaluation strategies (data contamination, model probing, counterfactual setups), the authors highlight the need for causal verification of evidence use in answers, with implications for high-stakes information retrieval and user trust. The work emphasizes human-verifiable attribution and calls for broader empirical validation across models and domains to establish reliable, faithful grounding in LLM-based QA systems.

Abstract

Retrieving relevant context is a common approach to reduce hallucinations and enhance answer reliability. Explicitly citing source documents allows users to verify generated responses and increases trust. Prior work largely evaluates citation correctness - whether cited documents support the corresponding statements. But citation correctness alone is insufficient. To establish trust in attributed answers, we must examine both citation correctness and citation faithfulness. In this work, we first disentangle the notions of citation correctness and faithfulness, which have been applied inconsistently in previous studies. Faithfulness ensures that the model's reliance on cited documents is genuine, reflecting actual reference use rather than superficial alignment with prior beliefs, which we call post-rationalization. We design an experiment that reveals the prevalent issue of post-rationalization, which undermines reliable attribution and may result in misplaced trust. Our findings suggest that current attributed answers often lack citation faithfulness (up to 57 percent of the citations), highlighting the need to evaluate correctness and faithfulness for trustworthy attribution in language models.

Paper Structure

This paper contains 14 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Different answer scenarios for the query "What is the capital of Germany?" (a) The ideal case, i.e., a correct citation that is faithful to the answer's generation process. (c) A correct but unfaithful citation, where the model post-rationalizes a citation to fit its prior. (b) A citation referring to the context that was used during the answer generation but does not contain the statement itself. (d) An incorrect citation.
  • Figure 2: Different methods of attribution generation regarding their likelihood for un-faithful behavior and post-rationalization (with approaches more likely of faithful behavior on the right).
  • Figure 3: Results of the post-rationalization tests. We measure the cases in which the model cited our adversarial document (which had the previously cited statement appended). Since we also change the input, the model is not guaranteed to produce the same statements again. Therefore, we also include the number of cases where we could match the old statement.