Table of Contents
Fetching ...

CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation

Nengbo Wang, Xiaotian Han, Jagdip Singh, Jing Ma, Vipin Chaudhary

TL;DR

The paper addresses limitations of retrieval-augmented generation in knowledge-intensive tasks, where text chunking disrupts context and semantic similarity fails to capture causal relevance. It introduces CausalRAG, a framework that builds and traces causally grounded graphs over external knowledge to guide retrieval and grounding, thereby improving answer faithfulness, context recall, and context precision. Through experiments on OpenAlex-derived academic papers and a qualitative case study, CausalRAG outperforms regular RAG and graph-based RAG baselines, demonstrating that grounding retrieval in causal reasoning yields more accurate, interpretable, and less hallucination-prone responses. The work highlights the practical potential of integrating causal reasoning into RAG and outlines scalability and domain-adaptation directions for long-context knowledge-intensive tasks.

Abstract

Large language models (LLMs) have revolutionized natural language processing (NLP), particularly through Retrieval-Augmented Generation (RAG), which enhances LLM capabilities by integrating external knowledge. However, traditional RAG systems face critical limitations, including disrupted contextual integrity due to text chunking, and over-reliance on semantic similarity for retrieval. To address these issues, we propose CausalRAG, a novel framework that incorporates causal graphs into the retrieval process. By constructing and tracing causal relationships, CausalRAG preserves contextual continuity and improves retrieval precision, leading to more accurate and interpretable responses. We evaluate CausalRAG against regular RAG and graph-based RAG approaches, demonstrating its superiority across several metrics. Our findings suggest that grounding retrieval in causal reasoning provides a promising approach to knowledge-intensive tasks.

CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation

TL;DR

The paper addresses limitations of retrieval-augmented generation in knowledge-intensive tasks, where text chunking disrupts context and semantic similarity fails to capture causal relevance. It introduces CausalRAG, a framework that builds and traces causally grounded graphs over external knowledge to guide retrieval and grounding, thereby improving answer faithfulness, context recall, and context precision. Through experiments on OpenAlex-derived academic papers and a qualitative case study, CausalRAG outperforms regular RAG and graph-based RAG baselines, demonstrating that grounding retrieval in causal reasoning yields more accurate, interpretable, and less hallucination-prone responses. The work highlights the practical potential of integrating causal reasoning into RAG and outlines scalability and domain-adaptation directions for long-context knowledge-intensive tasks.

Abstract

Large language models (LLMs) have revolutionized natural language processing (NLP), particularly through Retrieval-Augmented Generation (RAG), which enhances LLM capabilities by integrating external knowledge. However, traditional RAG systems face critical limitations, including disrupted contextual integrity due to text chunking, and over-reliance on semantic similarity for retrieval. To address these issues, we propose CausalRAG, a novel framework that incorporates causal graphs into the retrieval process. By constructing and tracing causal relationships, CausalRAG preserves contextual continuity and improves retrieval precision, leading to more accurate and interpretable responses. We evaluate CausalRAG against regular RAG and graph-based RAG approaches, demonstrating its superiority across several metrics. Our findings suggest that grounding retrieval in causal reasoning provides a promising approach to knowledge-intensive tasks.

Paper Structure

This paper contains 26 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Analytical and experimental studies reveal limitations in regular RAG and GraphRAG. (a) identifies three key retrieval and generation issues in regular RAG; (b) evaluates RAG via context precision and recall, showing regular RAG excels in recall but lacks precision. GraphRAG improves precision but trades off some recall.
  • Figure 2: Overview of CausalRAG's architecture. Documents are indexed as graphs, and queries retrieve causally related nodes. A causal summary is generated and combined with the query to ensure grounded responses.
  • Figure 3: Performance comparison of CausalRAG, regular RAG, and other graph-based RAGs across three key metrics: answer faithfulness, context recall, and context precision.
  • Figure 4: Case Study – A user uploads a long paper and asks a related question. This figure compares Regular RAG, GraphRAG, and CausalRAG by analyzing their retrieval processes. It highlights the drawbacks of semantic and graph-based retrieval and shows how causal reasoning in CausalRAG leads to more robust and precise results.
  • Figure 5: Case Study – A follow-up experiment evaluates the RAGs in the previous case using three versions of the same paper. Graph-based methods improve with length, while CausalRAG remains consistently robust.
  • ...and 2 more figures