Table of Contents
Fetching ...

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

Yaoze Zhang, Rong Wu, Pinlong Cai, Xiaoman Wang, Guohang Yan, Song Mao, Ding Wang, Botian Shi

TL;DR

This work targets knowledge-grounded generation by addressing two key limitations of KG-based RAG: disconnected semantic islands in high-level summaries and retrieval that ignores graph topology. LeanRAG fuses deep semantic aggregation to build a multi-level, navigable knowledge graph with explicit inter-cluster relations and a bottom-up, structure-guided retrieval based on Lowest Common Ancestor traversal to obtain concise yet comprehensive evidence. The approach yields state-of-the-art QA performance across four diverse benchmarks and substantially reduces retrieval redundancy by about 46%, while ablations show the critical roles of inter-cluster relations and access to original textual context. These findings demonstrate a scalable, efficient framework for coherent, knowledge-grounded generation and offer practical benefits for multi-domain QA tasks.

Abstract

Retrieval-Augmented Generation (RAG) plays a crucial role in grounding Large Language Models by leveraging external knowledge, whereas the effectiveness is often compromised by the retrieval of contextually flawed or incomplete information. To address this, knowledge graph-based RAG methods have evolved towards hierarchical structures, organizing knowledge into multi-level summaries. However, these approaches still suffer from two critical, unaddressed challenges: high-level conceptual summaries exist as disconnected ``semantic islands'', lacking the explicit relations needed for cross-community reasoning; and the retrieval process itself remains structurally unaware, often degenerating into an inefficient flat search that fails to exploit the graph's rich topology. To overcome these limitations, we introduce LeanRAG, a framework that features a deeply collaborative design combining knowledge aggregation and retrieval strategies. LeanRAG first employs a novel semantic aggregation algorithm that forms entity clusters and constructs new explicit relations among aggregation-level summaries, creating a fully navigable semantic network. Then, a bottom-up, structure-guided retrieval strategy anchors queries to the most relevant fine-grained entities and then systematically traverses the graph's semantic pathways to gather concise yet contextually comprehensive evidence sets. The LeanRAG can mitigate the substantial overhead associated with path retrieval on graphs and minimizes redundant information retrieval. Extensive experiments on four challenging QA benchmarks with different domains demonstrate that LeanRAG significantly outperforming existing methods in response quality while reducing 46\% retrieval redundancy. Code is available at: https://github.com/RaZzzyz/LeanRAG

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

TL;DR

This work targets knowledge-grounded generation by addressing two key limitations of KG-based RAG: disconnected semantic islands in high-level summaries and retrieval that ignores graph topology. LeanRAG fuses deep semantic aggregation to build a multi-level, navigable knowledge graph with explicit inter-cluster relations and a bottom-up, structure-guided retrieval based on Lowest Common Ancestor traversal to obtain concise yet comprehensive evidence. The approach yields state-of-the-art QA performance across four diverse benchmarks and substantially reduces retrieval redundancy by about 46%, while ablations show the critical roles of inter-cluster relations and access to original textual context. These findings demonstrate a scalable, efficient framework for coherent, knowledge-grounded generation and offer practical benefits for multi-domain QA tasks.

Abstract

Retrieval-Augmented Generation (RAG) plays a crucial role in grounding Large Language Models by leveraging external knowledge, whereas the effectiveness is often compromised by the retrieval of contextually flawed or incomplete information. To address this, knowledge graph-based RAG methods have evolved towards hierarchical structures, organizing knowledge into multi-level summaries. However, these approaches still suffer from two critical, unaddressed challenges: high-level conceptual summaries exist as disconnected ``semantic islands'', lacking the explicit relations needed for cross-community reasoning; and the retrieval process itself remains structurally unaware, often degenerating into an inefficient flat search that fails to exploit the graph's rich topology. To overcome these limitations, we introduce LeanRAG, a framework that features a deeply collaborative design combining knowledge aggregation and retrieval strategies. LeanRAG first employs a novel semantic aggregation algorithm that forms entity clusters and constructs new explicit relations among aggregation-level summaries, creating a fully navigable semantic network. Then, a bottom-up, structure-guided retrieval strategy anchors queries to the most relevant fine-grained entities and then systematically traverses the graph's semantic pathways to gather concise yet contextually comprehensive evidence sets. The LeanRAG can mitigate the substantial overhead associated with path retrieval on graphs and minimizes redundant information retrieval. Extensive experiments on four challenging QA benchmarks with different domains demonstrate that LeanRAG significantly outperforming existing methods in response quality while reducing 46\% retrieval redundancy. Code is available at: https://github.com/RaZzzyz/LeanRAG

Paper Structure

This paper contains 41 sections, 9 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Comparison of typical LLM retrieval-augmented generation frameworks.
  • Figure 2: Overview of the LeanRAG framework.
  • Figure 3: Comparison in retrieval tokens across four datasets