HugRAG: Hierarchical Causal Knowledge Graph Design for RAG
Nengbo Wang, Tuo Liang, Vikash Singh, Chaoda Song, Van Yang, Yu Yin, Jing Ma, Jagdip Singh, Vipin Chaudhary
TL;DR
HugRAG tackles the recall-precision trade-off in retrieval-augmented generation by introducing hierarchical causal gates that connect modular knowledge components and enable explicit causal reasoning across large, heterogeneous graphs. The method constructs a multi-level knowledge graph, defines sparse offline causal gates, and performs online causally guided expansion coupled with a spurious-aware grounding step that yields a causally validated subgraph S^*. Across diverse datasets, including HolisQA, HugRAG achieves superior recall without sacrificing grounding, demonstrating strong performance and scalability. The work also introduces HolisQA to challenge holistic comprehension and demonstrates that causal gating provides robustness not only for standard QA but also for open-ended reasoning tasks. Overall, HugRAG establishes a principled foundation for scalable, causally grounded RAG systems with practical implications for reliable knowledge-grounded generation.
Abstract
Retrieval augmented generation (RAG) has enhanced large language models by enabling access to external knowledge, with graph-based RAG emerging as a powerful paradigm for structured retrieval and reasoning. However, existing graph-based methods often over-rely on surface-level node matching and lack explicit causal modeling, leading to unfaithful or spurious answers. Prior attempts to incorporate causality are typically limited to local or single-document contexts and also suffer from information isolation that arises from modular graph structures, which hinders scalability and cross-module causal reasoning. To address these challenges, we propose HugRAG, a framework that rethinks knowledge organization for graph-based RAG through causal gating across hierarchical modules. HugRAG explicitly models causal relationships to suppress spurious correlations while enabling scalable reasoning over large-scale knowledge graphs. Extensive experiments demonstrate that HugRAG consistently outperforms competitive graph-based RAG baselines across multiple datasets and evaluation metrics. Our work establishes a principled foundation for structured, scalable, and causally grounded RAG systems.
