KG-Retriever: Efficient Knowledge Indexing for Retrieval-Augmented Large Language Models
Weijie Chen, Ting Bai, Jinbo Su, Jian Luan, Wei Liu, Chuan Shi
TL;DR
KG-Retriever addresses the challenge of multi-hop, knowledge-intensive QA by introducing a Hierarchical Index Graph (HIG) that combines a knowledge-graph layer with a collaborative document layer. This two-layer graph enables a two-stage retrieval: document-level collaboration to gather cross-document context, followed by entity-level KG extraction and retrieval to refine the answer with concise inter-entity relationships. The approach achieves state-of-the-art performance in five open-domain QA datasets using a single retrieval step, while significantly reducing generation time compared with iterative methods. The architecture enhances cross-document reasoning and information fusion, offering a practical, efficient RAG solution for large language models operating on large corpora. Potential extensions include dynamic indexing for evolving corpora and application to other NLP tasks beyond QA.
Abstract
Large language models with retrieval-augmented generation encounter a pivotal challenge in intricate retrieval tasks, e.g., multi-hop question answering, which requires the model to navigate across multiple documents and generate comprehensive responses based on fragmented information. To tackle this challenge, we introduce a novel Knowledge Graph-based RAG framework with a hierarchical knowledge retriever, termed KG-Retriever. The retrieval indexing in KG-Retriever is constructed on a hierarchical index graph that consists of a knowledge graph layer and a collaborative document layer. The associative nature of graph structures is fully utilized to strengthen intra-document and inter-document connectivity, thereby fundamentally alleviating the information fragmentation problem and meanwhile improving the retrieval efficiency in cross-document retrieval of LLMs. With the coarse-grained collaborative information from neighboring documents and concise information from the knowledge graph, KG-Retriever achieves marked improvements on five public QA datasets, showing the effectiveness and efficiency of our proposed RAG framework.
