Table of Contents
Fetching ...

Knowledge Graph-Guided Retrieval Augmented Generation

Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, Wei Hu

TL;DR

This work tackles LLM hallucinations by augmenting retrieval with structured knowledge graphs. It introduces KG^2RAG, which offline-links chunks to a KG, performs semantic seed retrieval followed by graph-guided expansion, and then uses KG-based organization to assemble coherent, fact-rich paragraphs for prompting LLMs. Empirical results on HotpotQA and variants show improvements in both response quality and retrieval quality, with ablation and robustness analyses validating the contributions of graph-guided expansion and KG-based organization. The approach offers a scalable, efficient pathway to leverage structured knowledge in RAG, with publicly released dataset and code to facilitate adoption and further research.

Abstract

Retrieval-augmented generation (RAG) has emerged as a promising technology for addressing hallucination issues in the responses generated by large language models (LLMs). Existing studies on RAG primarily focus on applying semantic-based approaches to retrieve isolated relevant chunks, which ignore their intrinsic relationships. In this paper, we propose a novel Knowledge Graph-Guided Retrieval Augmented Generation (KG$^2$RAG) framework that utilizes knowledge graphs (KGs) to provide fact-level relationships between chunks, improving the diversity and coherence of the retrieved results. Specifically, after performing a semantic-based retrieval to provide seed chunks, KG$^2$RAG employs a KG-guided chunk expansion process and a KG-based chunk organization process to deliver relevant and important knowledge in well-organized paragraphs. Extensive experiments conducted on the HotpotQA dataset and its variants demonstrate the advantages of KG$^2$RAG compared to existing RAG-based approaches, in terms of both response quality and retrieval quality.

Knowledge Graph-Guided Retrieval Augmented Generation

TL;DR

This work tackles LLM hallucinations by augmenting retrieval with structured knowledge graphs. It introduces KG^2RAG, which offline-links chunks to a KG, performs semantic seed retrieval followed by graph-guided expansion, and then uses KG-based organization to assemble coherent, fact-rich paragraphs for prompting LLMs. Empirical results on HotpotQA and variants show improvements in both response quality and retrieval quality, with ablation and robustness analyses validating the contributions of graph-guided expansion and KG-based organization. The approach offers a scalable, efficient pathway to leverage structured knowledge in RAG, with publicly released dataset and code to facilitate adoption and further research.

Abstract

Retrieval-augmented generation (RAG) has emerged as a promising technology for addressing hallucination issues in the responses generated by large language models (LLMs). Existing studies on RAG primarily focus on applying semantic-based approaches to retrieve isolated relevant chunks, which ignore their intrinsic relationships. In this paper, we propose a novel Knowledge Graph-Guided Retrieval Augmented Generation (KGRAG) framework that utilizes knowledge graphs (KGs) to provide fact-level relationships between chunks, improving the diversity and coherence of the retrieved results. Specifically, after performing a semantic-based retrieval to provide seed chunks, KGRAG employs a KG-guided chunk expansion process and a KG-based chunk organization process to deliver relevant and important knowledge in well-organized paragraphs. Extensive experiments conducted on the HotpotQA dataset and its variants demonstrate the advantages of KGRAG compared to existing RAG-based approaches, in terms of both response quality and retrieval quality.

Paper Structure

This paper contains 30 sections, 8 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: A comparison among LLM-only, Semantic RAG, and Graph RAG paradigms.
  • Figure 2: Workflow of the proposed KG$^2$RAG.
  • Figure 3: The prompt for triplet extraction.
  • Figure 4: Statistics of triplet extraction.
  • Figure 5: Experimental results with varying top-$k$ on HotpotQA in distractor setting.