Table of Contents
Fetching ...

HugRAG: Hierarchical Causal Knowledge Graph Design for RAG

Nengbo Wang, Tuo Liang, Vikash Singh, Chaoda Song, Van Yang, Yu Yin, Jing Ma, Jagdip Singh, Vipin Chaudhary

TL;DR

HugRAG tackles the recall-precision trade-off in retrieval-augmented generation by introducing hierarchical causal gates that connect modular knowledge components and enable explicit causal reasoning across large, heterogeneous graphs. The method constructs a multi-level knowledge graph, defines sparse offline causal gates, and performs online causally guided expansion coupled with a spurious-aware grounding step that yields a causally validated subgraph S^*. Across diverse datasets, including HolisQA, HugRAG achieves superior recall without sacrificing grounding, demonstrating strong performance and scalability. The work also introduces HolisQA to challenge holistic comprehension and demonstrates that causal gating provides robustness not only for standard QA but also for open-ended reasoning tasks. Overall, HugRAG establishes a principled foundation for scalable, causally grounded RAG systems with practical implications for reliable knowledge-grounded generation.

Abstract

Retrieval augmented generation (RAG) has enhanced large language models by enabling access to external knowledge, with graph-based RAG emerging as a powerful paradigm for structured retrieval and reasoning. However, existing graph-based methods often over-rely on surface-level node matching and lack explicit causal modeling, leading to unfaithful or spurious answers. Prior attempts to incorporate causality are typically limited to local or single-document contexts and also suffer from information isolation that arises from modular graph structures, which hinders scalability and cross-module causal reasoning. To address these challenges, we propose HugRAG, a framework that rethinks knowledge organization for graph-based RAG through causal gating across hierarchical modules. HugRAG explicitly models causal relationships to suppress spurious correlations while enabling scalable reasoning over large-scale knowledge graphs. Extensive experiments demonstrate that HugRAG consistently outperforms competitive graph-based RAG baselines across multiple datasets and evaluation metrics. Our work establishes a principled foundation for structured, scalable, and causally grounded RAG systems.

HugRAG: Hierarchical Causal Knowledge Graph Design for RAG

TL;DR

HugRAG tackles the recall-precision trade-off in retrieval-augmented generation by introducing hierarchical causal gates that connect modular knowledge components and enable explicit causal reasoning across large, heterogeneous graphs. The method constructs a multi-level knowledge graph, defines sparse offline causal gates, and performs online causally guided expansion coupled with a spurious-aware grounding step that yields a causally validated subgraph S^*. Across diverse datasets, including HolisQA, HugRAG achieves superior recall without sacrificing grounding, demonstrating strong performance and scalability. The work also introduces HolisQA to challenge holistic comprehension and demonstrates that causal gating provides robustness not only for standard QA but also for open-ended reasoning tasks. Overall, HugRAG establishes a principled foundation for scalable, causally grounded RAG systems with practical implications for reliable knowledge-grounded generation.

Abstract

Retrieval augmented generation (RAG) has enhanced large language models by enabling access to external knowledge, with graph-based RAG emerging as a powerful paradigm for structured retrieval and reasoning. However, existing graph-based methods often over-rely on surface-level node matching and lack explicit causal modeling, leading to unfaithful or spurious answers. Prior attempts to incorporate causality are typically limited to local or single-document contexts and also suffer from information isolation that arises from modular graph structures, which hinders scalability and cross-module causal reasoning. To address these challenges, we propose HugRAG, a framework that rethinks knowledge organization for graph-based RAG through causal gating across hierarchical modules. HugRAG explicitly models causal relationships to suppress spurious correlations while enabling scalable reasoning over large-scale knowledge graphs. Extensive experiments demonstrate that HugRAG consistently outperforms competitive graph-based RAG baselines across multiple datasets and evaluation metrics. Our work establishes a principled foundation for structured, scalable, and causally grounded RAG systems.
Paper Structure (66 sections, 5 equations, 15 figures, 6 tables, 2 algorithms)

This paper contains 66 sections, 5 equations, 15 figures, 6 tables, 2 algorithms.

Figures (15)

  • Figure 1: Comparison of three retrieval paradigms, Standard RAG, Graph-based RAG, and HugRAG, on a citywide blackout query. Standard RAG misses key evidence under semantic retrieval. Graph-based RAG can be trapped by intrinsic modularity or grouping structure. HugRAG leverages hierarchical causal gates to bridge modular boundaries, effectively breaking information isolation and explicitly identifying the underlying causal path.
  • Figure 2: Overview of the HugRAG pipeline. In the offline stage, raw texts are embedded to build a knowledge graph and a vector store, then partitioning forms a hierarchical graph and an LLM identifies causal relations to construct a graph with causal gates. In the online stage, the query is embedded and scored to retrieve top K entities, then N hop traversal uses causal gates to cross modules and assemble a context subgraph; an LLM further distinguishes causal versus spurious relations to produce the final context and answer.
  • Figure 3: Ablation Study. H: Hierarchical Structure; CG: Causal Gates; Causal/SP-Causal: Standard vs. Spurious-Aware Causal Identification. w/o and w/ denote exclusion or inclusion.
  • Figure 4: Scalability analysis of HugRAG and other RAG baselines across varying source text lengths (5K to 1.5M characters).
  • Figure 5: Prompt for Causal Path Identification with Spurious Distinction (HugRAG Main Setting). The model is explicitly instructed to segregate non-causal associations into a separate list to enhance reasoning precision.
  • ...and 10 more figures