Table of Contents
Fetching ...

You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures

Shengyuan Chen, Chuang Zhou, Zheng Yuan, Qinggang Zhang, Zeyang Cui, Hao Chen, Yilin Xiao, Jiannong Cao, Xiao Huang

TL;DR

This work tackles hallucination and inefficiency in retrieval-augmented generation by removing the need for pre-built graphs. It introduces LogicRAG, which dynamically constructs a Query Logic Dependency Graph (a DAG) from the input query, linearizes reasoning with a topological order, and uses context and graph pruning to curb token usage while preserving reasoning quality. The framework achieves higher QA accuracy across multiple multi-hop benchmarks than state-of-the-art GraphRAG baselines and offers practical efficiency by eliminating offline graph construction and reducing redundant retrieval. The approach provides a principled, scalable solution for knowledge-intensive QA with large language models, suitable for dynamic knowledge bases and complex reasoning tasks.

Abstract

Large language models (LLMs) often suffer from hallucination, generating factually incorrect statements when handling questions beyond their knowledge and perception. Retrieval-augmented generation (RAG) addresses this by retrieving query-relevant contexts from knowledge bases to support LLM reasoning. Recent advances leverage pre-constructed graphs to capture the relational connections among distributed documents, showing remarkable performance in complex tasks. However, existing Graph-based RAG (GraphRAG) methods rely on a costly process to transform the corpus into a graph, introducing overwhelming token cost and update latency. Moreover, real-world queries vary in type and complexity, requiring different logic structures for accurate reasoning. The pre-built graph may not align with these required structures, resulting in ineffective knowledge retrieval. To this end, we propose a $\textbf{Logic}$-aware $\textbf{R}etrieval$-$\textbf{A}$ugmented $\textbf{G}$eneration framework ($\textbf{LogicRAG}$) that dynamically extracts reasoning structures at inference time to guide adaptive retrieval without any pre-built graph. LogicRAG begins by decomposing the input query into a set of subproblems and constructing a directed acyclic graph (DAG) to model the logical dependencies among them. To support coherent multi-step reasoning, LogicRAG then linearizes the graph using topological sort, so that subproblems can be addressed in a logically consistent order. Besides, LogicRAG applies graph pruning to reduce redundant retrieval and uses context pruning to filter irrelevant context, significantly reducing the overall token cost. Extensive experiments demonstrate that LogicRAG achieves both superior performance and efficiency compared to state-of-the-art baselines.

You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures

TL;DR

This work tackles hallucination and inefficiency in retrieval-augmented generation by removing the need for pre-built graphs. It introduces LogicRAG, which dynamically constructs a Query Logic Dependency Graph (a DAG) from the input query, linearizes reasoning with a topological order, and uses context and graph pruning to curb token usage while preserving reasoning quality. The framework achieves higher QA accuracy across multiple multi-hop benchmarks than state-of-the-art GraphRAG baselines and offers practical efficiency by eliminating offline graph construction and reducing redundant retrieval. The approach provides a principled, scalable solution for knowledge-intensive QA with large language models, suitable for dynamic knowledge bases and complex reasoning tasks.

Abstract

Large language models (LLMs) often suffer from hallucination, generating factually incorrect statements when handling questions beyond their knowledge and perception. Retrieval-augmented generation (RAG) addresses this by retrieving query-relevant contexts from knowledge bases to support LLM reasoning. Recent advances leverage pre-constructed graphs to capture the relational connections among distributed documents, showing remarkable performance in complex tasks. However, existing Graph-based RAG (GraphRAG) methods rely on a costly process to transform the corpus into a graph, introducing overwhelming token cost and update latency. Moreover, real-world queries vary in type and complexity, requiring different logic structures for accurate reasoning. The pre-built graph may not align with these required structures, resulting in ineffective knowledge retrieval. To this end, we propose a -aware -ugmented eneration framework () that dynamically extracts reasoning structures at inference time to guide adaptive retrieval without any pre-built graph. LogicRAG begins by decomposing the input query into a set of subproblems and constructing a directed acyclic graph (DAG) to model the logical dependencies among them. To support coherent multi-step reasoning, LogicRAG then linearizes the graph using topological sort, so that subproblems can be addressed in a logically consistent order. Besides, LogicRAG applies graph pruning to reduce redundant retrieval and uses context pruning to filter irrelevant context, significantly reducing the overall token cost. Extensive experiments demonstrate that LogicRAG achieves both superior performance and efficiency compared to state-of-the-art baselines.

Paper Structure

This paper contains 19 sections, 6 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Token and runtime cost of the graph construction process of graph-based RAG methods on 2WikiMQA.
  • Figure 2: Illustration of the proposed LogicRAG.
  • Figure 3: Word-level Jaccard similarity between subqueries across rounds in the agentic RAG process, averaging across the dataset.
  • Figure 4: Comparison between two strategies: sampling w/ and w/o replacement.
  • Figure 5: Distribution of accuracy across question types. Each ball represents a question type, with the y-axis position indicating its accuracy and the radius reflecting its proportion in the dataset.
  • ...and 2 more figures