Table of Contents
Fetching ...

Graph-Guided Concept Selection for Efficient Retrieval-Augmented Generation

Ziyu Liu, Yijing Liu, Jianfei Yuan, Minzhi Yan, Le Yue, Honghui Xiong, Yi Yang

TL;DR

GraphRAG enhances QA by leveraging a knowledge graph but incurs prohibitive construction costs. The authors propose G2ConS, which combines Core Chunk Selection with a LLM-independent Concept Graph to identify high-value concepts and prune input chunks, enabling dual-path retrieval over both the concept graph and a core-KG with a weighted ensemble. Key contributions include a concept-graph construction that blends semantic relevance and co-occurrence, a robust dual-path retrieval strategy with local and global reranking, and extensive ablation and parameter studies demonstrating favorable cost–performance trade-offs across Musique, HotpotQA, and 2WikiMultihopQA. The results show substantial improvements in QA quality and significant reductions in construction costs, supporting scalable, retrieval-augmented QA in multi-hop and domain-specific settings.

Abstract

Graph-based RAG constructs a knowledge graph (KG) from text chunks to enhance retrieval in Large Language Model (LLM)-based question answering. It is especially beneficial in domains such as biomedicine, law, and political science, where effective retrieval often involves multi-hop reasoning over proprietary documents. However, these methods demand numerous LLM calls to extract entities and relations from text chunks, incurring prohibitive costs at scale. Through a carefully designed ablation study, we observe that certain words (termed concepts) and their associated documents are more important. Based on this insight, we propose Graph-Guided Concept Selection (G2ConS). Its core comprises a chunk selection method and an LLM-independent concept graph. The former selects salient document chunks to reduce KG construction costs; the latter closes knowledge gaps introduced by chunk selection at zero cost. Evaluations on multiple real-world datasets show that G2ConS outperforms all baselines in construction cost, retrieval effectiveness, and answering quality.

Graph-Guided Concept Selection for Efficient Retrieval-Augmented Generation

TL;DR

GraphRAG enhances QA by leveraging a knowledge graph but incurs prohibitive construction costs. The authors propose G2ConS, which combines Core Chunk Selection with a LLM-independent Concept Graph to identify high-value concepts and prune input chunks, enabling dual-path retrieval over both the concept graph and a core-KG with a weighted ensemble. Key contributions include a concept-graph construction that blends semantic relevance and co-occurrence, a robust dual-path retrieval strategy with local and global reranking, and extensive ablation and parameter studies demonstrating favorable cost–performance trade-offs across Musique, HotpotQA, and 2WikiMultihopQA. The results show substantial improvements in QA quality and significant reductions in construction costs, supporting scalable, retrieval-augmented QA in multi-hop and domain-specific settings.

Abstract

Graph-based RAG constructs a knowledge graph (KG) from text chunks to enhance retrieval in Large Language Model (LLM)-based question answering. It is especially beneficial in domains such as biomedicine, law, and political science, where effective retrieval often involves multi-hop reasoning over proprietary documents. However, these methods demand numerous LLM calls to extract entities and relations from text chunks, incurring prohibitive costs at scale. Through a carefully designed ablation study, we observe that certain words (termed concepts) and their associated documents are more important. Based on this insight, we propose Graph-Guided Concept Selection (G2ConS). Its core comprises a chunk selection method and an LLM-independent concept graph. The former selects salient document chunks to reduce KG construction costs; the latter closes knowledge gaps introduced by chunk selection at zero cost. Evaluations on multiple real-world datasets show that G2ConS outperforms all baselines in construction cost, retrieval effectiveness, and answering quality.

Paper Structure

This paper contains 16 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The design and results of the concept deletion experiment. (a) We divide text chunks into different groups based on their associated words, referred to as concepts. (b) By deleting concepts in different orders, we find that some concepts have greater importance.
  • Figure 2: Overview of the Proposed G2ConS. (1) We extract concepts from text chunks and construct a concept graph based on semantic and co-occurrence relations. (2) We perform core chunk selection and build a low-cost core knowledge graph (core-KG). (3) G2ConS leverages dual-path retrieval to effectively utilize both the concept graph and the core-KG.
  • Figure 3: Construction Overhead vs. Performance on Musique.
  • Figure 4: Answer quality by varying $\kappa$ and $\lambda$.