CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation
Nengbo Wang, Xiaotian Han, Jagdip Singh, Jing Ma, Vipin Chaudhary
TL;DR
The paper addresses limitations of retrieval-augmented generation in knowledge-intensive tasks, where text chunking disrupts context and semantic similarity fails to capture causal relevance. It introduces CausalRAG, a framework that builds and traces causally grounded graphs over external knowledge to guide retrieval and grounding, thereby improving answer faithfulness, context recall, and context precision. Through experiments on OpenAlex-derived academic papers and a qualitative case study, CausalRAG outperforms regular RAG and graph-based RAG baselines, demonstrating that grounding retrieval in causal reasoning yields more accurate, interpretable, and less hallucination-prone responses. The work highlights the practical potential of integrating causal reasoning into RAG and outlines scalability and domain-adaptation directions for long-context knowledge-intensive tasks.
Abstract
Large language models (LLMs) have revolutionized natural language processing (NLP), particularly through Retrieval-Augmented Generation (RAG), which enhances LLM capabilities by integrating external knowledge. However, traditional RAG systems face critical limitations, including disrupted contextual integrity due to text chunking, and over-reliance on semantic similarity for retrieval. To address these issues, we propose CausalRAG, a novel framework that incorporates causal graphs into the retrieval process. By constructing and tracing causal relationships, CausalRAG preserves contextual continuity and improves retrieval precision, leading to more accurate and interpretable responses. We evaluate CausalRAG against regular RAG and graph-based RAG approaches, demonstrating its superiority across several metrics. Our findings suggest that grounding retrieval in causal reasoning provides a promising approach to knowledge-intensive tasks.
