Table of Contents
Fetching ...

KG-Retriever: Efficient Knowledge Indexing for Retrieval-Augmented Large Language Models

Weijie Chen, Ting Bai, Jinbo Su, Jian Luan, Wei Liu, Chuan Shi

TL;DR

KG-Retriever addresses the challenge of multi-hop, knowledge-intensive QA by introducing a Hierarchical Index Graph (HIG) that combines a knowledge-graph layer with a collaborative document layer. This two-layer graph enables a two-stage retrieval: document-level collaboration to gather cross-document context, followed by entity-level KG extraction and retrieval to refine the answer with concise inter-entity relationships. The approach achieves state-of-the-art performance in five open-domain QA datasets using a single retrieval step, while significantly reducing generation time compared with iterative methods. The architecture enhances cross-document reasoning and information fusion, offering a practical, efficient RAG solution for large language models operating on large corpora. Potential extensions include dynamic indexing for evolving corpora and application to other NLP tasks beyond QA.

Abstract

Large language models with retrieval-augmented generation encounter a pivotal challenge in intricate retrieval tasks, e.g., multi-hop question answering, which requires the model to navigate across multiple documents and generate comprehensive responses based on fragmented information. To tackle this challenge, we introduce a novel Knowledge Graph-based RAG framework with a hierarchical knowledge retriever, termed KG-Retriever. The retrieval indexing in KG-Retriever is constructed on a hierarchical index graph that consists of a knowledge graph layer and a collaborative document layer. The associative nature of graph structures is fully utilized to strengthen intra-document and inter-document connectivity, thereby fundamentally alleviating the information fragmentation problem and meanwhile improving the retrieval efficiency in cross-document retrieval of LLMs. With the coarse-grained collaborative information from neighboring documents and concise information from the knowledge graph, KG-Retriever achieves marked improvements on five public QA datasets, showing the effectiveness and efficiency of our proposed RAG framework.

KG-Retriever: Efficient Knowledge Indexing for Retrieval-Augmented Large Language Models

TL;DR

KG-Retriever addresses the challenge of multi-hop, knowledge-intensive QA by introducing a Hierarchical Index Graph (HIG) that combines a knowledge-graph layer with a collaborative document layer. This two-layer graph enables a two-stage retrieval: document-level collaboration to gather cross-document context, followed by entity-level KG extraction and retrieval to refine the answer with concise inter-entity relationships. The approach achieves state-of-the-art performance in five open-domain QA datasets using a single retrieval step, while significantly reducing generation time compared with iterative methods. The architecture enhances cross-document reasoning and information fusion, offering a practical, efficient RAG solution for large language models operating on large corpora. Potential extensions include dynamic indexing for evolving corpora and application to other NLP tasks beyond QA.

Abstract

Large language models with retrieval-augmented generation encounter a pivotal challenge in intricate retrieval tasks, e.g., multi-hop question answering, which requires the model to navigate across multiple documents and generate comprehensive responses based on fragmented information. To tackle this challenge, we introduce a novel Knowledge Graph-based RAG framework with a hierarchical knowledge retriever, termed KG-Retriever. The retrieval indexing in KG-Retriever is constructed on a hierarchical index graph that consists of a knowledge graph layer and a collaborative document layer. The associative nature of graph structures is fully utilized to strengthen intra-document and inter-document connectivity, thereby fundamentally alleviating the information fragmentation problem and meanwhile improving the retrieval efficiency in cross-document retrieval of LLMs. With the coarse-grained collaborative information from neighboring documents and concise information from the knowledge graph, KG-Retriever achieves marked improvements on five public QA datasets, showing the effectiveness and efficiency of our proposed RAG framework.

Paper Structure

This paper contains 19 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The overview architecture of KG-Retriever. It consists of three components: the indexing construction component based on a hierarchical index graph (HIG), the knowledge retrieval component, and the response generation component.
  • Figure 2: The performance of RAG methods with different LLM backbones (Qwen-7B, Qwen-14B and GPT-4).
  • Figure 3: Hyperparameter Analysis on HotpotQA and CRUD-QA2 datasets. $K$ is a hyper-parameter that selects the most similar neighbors, and $T$ and $\lambda$ are the hyper-parameter to control the number of retrieved triples.