Table of Contents
Fetching ...

PathRAG: Pruning Graph-based Retrieval Augmented Generation with Relational Paths

Boyu Chen, Zirui Guo, Zidan Yang, Yuluo Chen, Junze Chen, Zhenghao Liu, Chuan Shi, Cheng Yang

TL;DR

PathRAG identifies redundancy and flat prompting as core limitations of graph-based RAG. It introduces a flow-based, distance-aware path retrieval mechanism to extract reliable relational paths from an indexing graph and uses path-based prompting to preserve structural relations in the final prompt. Empirical results across diverse datasets show PathRAG consistently outperforms state-of-the-art baselines across multiple evaluation dimensions while reducing token consumption. The approach demonstrates robustness to graph sparsity and varying LLM backbones, offering a practical, token-efficient enhancement for retrieval-augmented generation in graph-structured knowledge databases.

Abstract

Retrieval-augmented generation (RAG) improves the response quality of large language models (LLMs) by retrieving knowledge from external databases. Typical RAG approaches split the text database into chunks, organizing them in a flat structure for efficient searches. To better capture the inherent dependencies and structured relationships across the text database, researchers propose to organize textual information into an indexing graph, known asgraph-based RAG. However, we argue that the limitation of current graph-based RAG methods lies in the redundancy of the retrieved information, rather than its insufficiency. Moreover, previous methods use a flat structure to organize retrieved information within the prompts, leading to suboptimal performance. To overcome these limitations, we propose PathRAG, which retrieves key relational paths from the indexing graph, and converts these paths into textual form for prompting LLMs. Specifically, PathRAG effectively reduces redundant information with flow-based pruning, while guiding LLMs to generate more logical and coherent responses with path-based prompting. Experimental results show that PathRAG consistently outperforms state-of-the-art baselines across six datasets and five evaluation dimensions. The code is available at the following link: https://github.com/BUPT-GAMMA/PathRAG

PathRAG: Pruning Graph-based Retrieval Augmented Generation with Relational Paths

TL;DR

PathRAG identifies redundancy and flat prompting as core limitations of graph-based RAG. It introduces a flow-based, distance-aware path retrieval mechanism to extract reliable relational paths from an indexing graph and uses path-based prompting to preserve structural relations in the final prompt. Empirical results across diverse datasets show PathRAG consistently outperforms state-of-the-art baselines across multiple evaluation dimensions while reducing token consumption. The approach demonstrates robustness to graph sparsity and varying LLM backbones, offering a practical, token-efficient enhancement for retrieval-augmented generation in graph-structured knowledge databases.

Abstract

Retrieval-augmented generation (RAG) improves the response quality of large language models (LLMs) by retrieving knowledge from external databases. Typical RAG approaches split the text database into chunks, organizing them in a flat structure for efficient searches. To better capture the inherent dependencies and structured relationships across the text database, researchers propose to organize textual information into an indexing graph, known asgraph-based RAG. However, we argue that the limitation of current graph-based RAG methods lies in the redundancy of the retrieved information, rather than its insufficiency. Moreover, previous methods use a flat structure to organize retrieved information within the prompts, leading to suboptimal performance. To overcome these limitations, we propose PathRAG, which retrieves key relational paths from the indexing graph, and converts these paths into textual form for prompting LLMs. Specifically, PathRAG effectively reduces redundant information with flow-based pruning, while guiding LLMs to generate more logical and coherent responses with path-based prompting. Experimental results show that PathRAG consistently outperforms state-of-the-art baselines across six datasets and five evaluation dimensions. The code is available at the following link: https://github.com/BUPT-GAMMA/PathRAG

Paper Structure

This paper contains 17 sections, 6 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Comparison between different graph-based RAG methods. GraphRAG edge2024graphrag uses all the information within certain communities, while LightRAG guo2024lightrag uses all the immediate neighbors of query-related nodes. In contrast, the proposed PathRAG focuses on key relational paths between query-related nodes to alleviate noise and reduce token consumption.
  • Figure 2: The overall framework of our proposed PathRAG with three main stages. 1) Node Retrieval Stage: Relevant nodes are retrieved from the indexing graph based on the keywords in the query; 2) Path Retrieval Stage: We design a flow-based pruning algorithm to extract key relational paths between each pair of retrieved nodes, and then retrieve paths with the highest reliability scores; 3) Answer Generation Stage: The retrieved paths are placed into prompts in ascending order of reliability scores, and finally fed into an LLM for answer generation.
  • Figure 3: Performance of PathRAG, LightRAG, and NaiveRAG under different levels of graph sparsity on the Agriculture and CS datasets.