Core-based Hierarchies for Efficient GraphRAG

Jakir Hossain; Ahmet Erdem Sarıyüce

Core-based Hierarchies for Efficient GraphRAG

Jakir Hossain, Ahmet Erdem Sarıyüce

TL;DR

This work introduces a set of lightweight heuristics that leverage the k-core hierarchy to construct size-bounded, connectivity-preserving communities for retrieval and summarization, along with a token-budget-aware sampling strategy that reduces LLM costs.

Abstract

Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge. However, existing vector-based methods often fail on global sensemaking tasks that require reasoning across many documents. GraphRAG addresses this by organizing documents into a knowledge graph with hierarchical communities that can be recursively summarized. Current GraphRAG approaches rely on Leiden clustering for community detection, but we prove that on sparse knowledge graphs, where average degree is constant and most nodes have low degree, modularity optimization admits exponentially many near-optimal partitions, making Leiden-based communities inherently non-reproducible. To address this, we propose replacing Leiden with k-core decomposition, which yields a deterministic, density-aware hierarchy in linear time. We introduce a set of lightweight heuristics that leverage the k-core hierarchy to construct size-bounded, connectivity-preserving communities for retrieval and summarization, along with a token-budget-aware sampling strategy that reduces LLM costs. We evaluate our methods on real-world datasets including financial earnings transcripts, news articles, and podcasts, using three LLMs for answer generation and five independent LLM judges for head-to-head evaluation. Across datasets and models, our approach consistently improves answer comprehensiveness and diversity while reducing token usage, demonstrating that k-core-based GraphRAG is an effective and efficient framework for global sensemaking.

Core-based Hierarchies for Efficient GraphRAG

TL;DR

Abstract

Paper Structure (28 sections, 1 theorem, 1 equation, 1 figure, 12 tables, 4 algorithms)

This paper contains 28 sections, 1 theorem, 1 equation, 1 figure, 12 tables, 4 algorithms.

Introduction
Related Work and Background
Community-based GraphRAG Overview
Hierarchical $k$-core Decomposition
Why Modularity Optimization is Unreliable on Sparse Knowledge Graphs?
A Robust Alternative: $k$-core Decomposition
Handling Residuals in $k$-core Hierarchy
Handling Small Clusters
Token Efficiency via Sampling
Experimental Setup
Evaluation Criteria
Results and Analysis
Results on Post–Cutoff Data
Evaluation on Full Data by GPT-5-mini
Statistical Analysis
...and 13 more sections

Key Result

Theorem 1

Let $G$ be a graph with $n$ nodes, $m$ edges, average degree $\bar{k} = 2m/n = O(1)$, and $n_{\mathrm{\le d}} = \Theta(n)$. Then for any $\varepsilon > d(2+\bar{k})/(2m)$, In particular, the number of near-optimal partitions is exponential in $n$, and the tolerance threshold $\varepsilon$ required to trigger this blowup is $O(1/n)$.

Figures (1)

Figure 1: $k$-core decomposition (left) and corresponding hierarchy tree produced by RkH (right).

Theorems & Definitions (1)

Theorem 1: Modularity Degeneracy in Sparse Graphs

Core-based Hierarchies for Efficient GraphRAG

TL;DR

Abstract

Core-based Hierarchies for Efficient GraphRAG

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (1)