Graph-based Approaches and Functionalities in Retrieval-Augmented Generation: A Comprehensive Survey
Zulun Zhu, Tiancheng Huang, Kai Wang, Junda Ye, Xinghe Chen, Siqiang Luo
TL;DR
This survey presents a graph-centric view of Retrieval-Augmented Generation (RAG), arguing that graph data management—via existing and text-derived knowledge graphs, industrial graph databases, and graph-produced structures—offers powerful grounding and multi-hop reasoning capabilities for LLMs. It introduces a taxonomy that spans database construction, retrieval algorithms (non-parameterized and learning-based), prompting strategies (topology-aware and text prompting), and graph-structured pipelines (sequential, loop, tree), and it analyzes graph-oriented tasks (KGQA, graph tasks, domain-specific applications) with performance and practicality considerations. The work consolidates insights from over 200 studies to articulate current challenges (freshness, scalability, provenance, explainability) and outlines future directions, including adaptive prompts, multi-modal graphs, dynamic graph handling, and user interaction. Overall, graph-based RAG demonstrates substantial potential to improve factual grounding, reasoning depth, and explainability in LLM-driven systems across domains, informing both research and industrial deployment.
Abstract
Large language models (LLMs) struggle with the factual error during inference due to the lack of sufficient training data and the most updated knowledge, leading to the hallucination problem. Retrieval-Augmented Generation (RAG) has gained attention as a promising solution to address the limitation of LLMs, by retrieving relevant information from external source to generate more accurate answers to the questions. Given the pervasive presence of structured knowledge in the external source, considerable strides in RAG have been made to employ the techniques related to graphs and achieve more complex reasoning based on the topological information between knowledge entities. However, there is currently neither unified review examining the diverse roles of graphs in RAG, nor a comprehensive resource to help researchers navigate and contribute to this evolving field. This survey offers a novel perspective on the functionality of graphs within RAG and their impact on enhancing performance across a wide range of graph-structured data. It provides a detailed breakdown of the roles that graphs play in RAG, covering database construction, algorithms, pipelines, and tasks. Finally, it identifies current challenges and outline future research directions, aiming to inspire further developments in this field. Our graph-centered analysis highlights the commonalities and differences in existing methods, setting the stage for future researchers in areas such as graph learning, database systems, and natural language processing.
