Table of Contents
Fetching ...

Evidence Contextualization and Counterfactual Attribution for Conversational QA over Heterogeneous Data with RAG Systems

Rishiraj Saha Roy, Joel Schlotthauer, Chris Hinze, Andreas Foltyn, Luzian Hahn, Fabian Kuech

TL;DR

This work tackles ConvQA over heterogeneous enterprise data by addressing two core issues in current RAG pipelines: insufficient contextualization of retrieved evidence and non-causal attribution explanations. It introduces RAGonite, a system that contextualizes evidence by appending page titles, headings, and surrounding text, and that computes counterfactual attribution to causally explain how each piece of evidence contributed to an answer. A new bilingual benchmark, ConfQuestions, with 300 questions across English and German over 215 Confluence pages, enables evaluation of retrieval, generation, and attribution for heterogeneous content. Results show that contextualized evidence improves both retrieval and answer quality, while counterfactual attribution yields robust, near 80% accuracy explanations, highlighting practical improvements for trustworthy ConvQA over enterprise data.

Abstract

Retrieval Augmented Generation (RAG) works as a backbone for interacting with an enterprise's own data via Conversational Question Answering (ConvQA). In a RAG system, a retriever fetches passages from a collection in response to a question, which are then included in the prompt of a large language model (LLM) for generating a natural language (NL) answer. However, several RAG systems today suffer from two shortcomings: (i) retrieved passages usually contain their raw text and lack appropriate document context, negatively impacting both retrieval and answering quality; and (ii) attribution strategies that explain answer generation typically rely only on similarity between the answer and the retrieved passages, thereby only generating plausible but not causal explanations. In this work, we demonstrate RAGONITE, a RAG system that remedies the above concerns by: (i) contextualizing evidence with source metadata and surrounding text; and (ii) computing counterfactual attribution, a causal explanation approach where the contribution of an evidence to an answer is determined by the similarity of the original response to the answer obtained by removing that evidence. To evaluate our proposals, we release a new benchmark ConfQuestions: it has 300 hand-created conversational questions, each in English and German, coupled with ground truth URLs, completed questions, and answers from 215 public Confluence pages. These documents are typical of enterprise wiki spaces with heterogeneous elements. Experiments with RAGONITE on ConfQuestions show the viability of our ideas: contextualization improves RAG performance, and counterfactual explanations outperform standard attribution.

Evidence Contextualization and Counterfactual Attribution for Conversational QA over Heterogeneous Data with RAG Systems

TL;DR

This work tackles ConvQA over heterogeneous enterprise data by addressing two core issues in current RAG pipelines: insufficient contextualization of retrieved evidence and non-causal attribution explanations. It introduces RAGonite, a system that contextualizes evidence by appending page titles, headings, and surrounding text, and that computes counterfactual attribution to causally explain how each piece of evidence contributed to an answer. A new bilingual benchmark, ConfQuestions, with 300 questions across English and German over 215 Confluence pages, enables evaluation of retrieval, generation, and attribution for heterogeneous content. Results show that contextualized evidence improves both retrieval and answer quality, while counterfactual attribution yields robust, near 80% accuracy explanations, highlighting practical improvements for trustworthy ConvQA over enterprise data.

Abstract

Retrieval Augmented Generation (RAG) works as a backbone for interacting with an enterprise's own data via Conversational Question Answering (ConvQA). In a RAG system, a retriever fetches passages from a collection in response to a question, which are then included in the prompt of a large language model (LLM) for generating a natural language (NL) answer. However, several RAG systems today suffer from two shortcomings: (i) retrieved passages usually contain their raw text and lack appropriate document context, negatively impacting both retrieval and answering quality; and (ii) attribution strategies that explain answer generation typically rely only on similarity between the answer and the retrieved passages, thereby only generating plausible but not causal explanations. In this work, we demonstrate RAGONITE, a RAG system that remedies the above concerns by: (i) contextualizing evidence with source metadata and surrounding text; and (ii) computing counterfactual attribution, a causal explanation approach where the contribution of an evidence to an answer is determined by the similarity of the original response to the answer obtained by removing that evidence. To evaluate our proposals, we release a new benchmark ConfQuestions: it has 300 hand-created conversational questions, each in English and German, coupled with ground truth URLs, completed questions, and answers from 215 public Confluence pages. These documents are typical of enterprise wiki spaces with heterogeneous elements. Experiments with RAGONITE on ConfQuestions show the viability of our ideas: contextualization improves RAG performance, and counterfactual explanations outperform standard attribution.

Paper Structure

This paper contains 12 sections, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: The RAGonite workflow enhances RAG pipelines at both ends, preprocessing evidence and explaining answers.
  • Figure 2: An annotated walkthrough the of RAGonite demo. Blue boxes guide the reader and are not part of the UI (Sec. \ref{['sec:walkthrough']}).
  • Figure 3: Contextualized evidence as in answer prompt.
  • Figure 4: Answer explanation by counterfactual attribution.
  • Figure 5: Toy wiki page with heterogeneous elements to motivate evidence contextualization. A question like todo for alice in oct rag meeting? can only be faithfully answered by joining information in the relevant table row, table footer, page title, and preceding heading and text. This implies that unless the evidence as stored in the DB contains the supporting context, there is no chance that a retriever can fetch it from the corpus with the question as a search query.
  • ...and 3 more figures