Table of Contents
Fetching ...

GRAG: Graph Retrieval-Augmented Generation

Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao

TL;DR

GRAG addresses the limitations of document-centric retrieval by handling textual graphs and their topology in both retrieval and generation. It introduces a divide-and-conquer ego-graph retrieval with soft pruning and employs two complementary prompts—hierarchical text descriptions and graph embeddings—to inject joint textual-topological context into LLMs. Across GraphQA, WebQSP, and ExplaGraphs, GRAG substantially outperforms traditional RAG and even frozen LLM baselines, highlighting the value of graph-context-aware prompting for multi-hop reasoning. The work offers datasets and code, demonstrates cross-dataset transfer, and emphasizes that graph-aware retrieval can reduce training costs while improving factual alignment.

Abstract

Naive Retrieval-Augmented Generation (RAG) focuses on individual documents during retrieval and, as a result, falls short in handling networked documents which are very popular in many applications such as citation graphs, social media, and knowledge graphs. To overcome this limitation, we introduce Graph Retrieval-Augmented Generation (GRAG), which tackles the fundamental challenges in retrieving textual subgraphs and integrating the joint textual and topological information into Large Language Models (LLMs) to enhance its generation. To enable efficient textual subgraph retrieval, we propose a novel divide-and-conquer strategy that retrieves the optimal subgraph structure in linear time. To achieve graph context-aware generation, incorporate textual graphs into LLMs through two complementary views-the text view and the graph view-enabling LLMs to more effectively comprehend and utilize the graph context. Extensive experiments on graph reasoning benchmarks demonstrate that in scenarios requiring multi-hop reasoning on textual graphs, our GRAG approach significantly outperforms current state-of-the-art RAG methods. Our datasets as well as codes of GRAG are available at https://github.com/HuieL/GRAG.

GRAG: Graph Retrieval-Augmented Generation

TL;DR

GRAG addresses the limitations of document-centric retrieval by handling textual graphs and their topology in both retrieval and generation. It introduces a divide-and-conquer ego-graph retrieval with soft pruning and employs two complementary prompts—hierarchical text descriptions and graph embeddings—to inject joint textual-topological context into LLMs. Across GraphQA, WebQSP, and ExplaGraphs, GRAG substantially outperforms traditional RAG and even frozen LLM baselines, highlighting the value of graph-context-aware prompting for multi-hop reasoning. The work offers datasets and code, demonstrates cross-dataset transfer, and emphasizes that graph-aware retrieval can reduce training costs while improving factual alignment.

Abstract

Naive Retrieval-Augmented Generation (RAG) focuses on individual documents during retrieval and, as a result, falls short in handling networked documents which are very popular in many applications such as citation graphs, social media, and knowledge graphs. To overcome this limitation, we introduce Graph Retrieval-Augmented Generation (GRAG), which tackles the fundamental challenges in retrieving textual subgraphs and integrating the joint textual and topological information into Large Language Models (LLMs) to enhance its generation. To enable efficient textual subgraph retrieval, we propose a novel divide-and-conquer strategy that retrieves the optimal subgraph structure in linear time. To achieve graph context-aware generation, incorporate textual graphs into LLMs through two complementary views-the text view and the graph view-enabling LLMs to more effectively comprehend and utilize the graph context. Extensive experiments on graph reasoning benchmarks demonstrate that in scenarios requiring multi-hop reasoning on textual graphs, our GRAG approach significantly outperforms current state-of-the-art RAG methods. Our datasets as well as codes of GRAG are available at https://github.com/HuieL/GRAG.
Paper Structure (22 sections, 10 equations, 5 figures, 5 tables)

This paper contains 22 sections, 10 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: GRAG retrieves textual subgraphs relevant to the query, rather than discrete entities as in RAG. Entities with similar topics tend to have connections, which improves the precision and robustness of the retrieval phase.
  • Figure 2: Illustration of our GRAG approach.
  • Figure 3: Performance of our GRAG approach on WebQSP as the ego-graph size and number of ego-graphs used vary.
  • Figure 4: An Example hierarchical description for a 2-hop ego-graph from a citation network.
  • Figure 5: Effects of the number of retrieved entities on the WebQSP dataset.