Table of Contents
Fetching ...

Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems

Jovan Pavlović, Miklós Krész, László Hajdu

TL;DR

The paper tackles retrieval shortcomings in multi-hop QA for RAG systems by introducing a spreading-activation–based retrieval framework operating over automatically constructed knowledge graphs. This approach enables associative, multi-step retrieval that complements LLM reasoning without requiring model retraining, and it plugs into existing RAG pipelines. Empirical results on two multi-hop benchmarks show SA-RAG either matches or exceeds iterative RAG baselines, with particularly strong gains when combined with chain-of-thought prompts and using small open-weight LLMs. The work emphasizes plug-and-play integration and resource-efficient reasoning, highlighting practical potential for robust, scalable reasoning over large document corpora.

Abstract

Despite initial successes and a variety of architectures, retrieval-augmented generation (RAG) systems still struggle to reliably retrieve and connect the multi-step evidence required for complicated reasoning tasks. Most of the standard RAG frameworks regard all retrieved information as equally reliable, overlooking the varying credibility and interconnected nature of large textual corpora. GraphRAG approaches offer potential improvement to RAG systems by integrating knowledge graphs, which structure information into nodes and edges, capture entity relationships, and enable multi-step logical traversal. However, GraphRAG is not always an ideal solution as it depends on high-quality graph representations of the corpus, which requires either pre-existing knowledge graphs that are expensive to build and update, or automated graph construction pipelines that are often unreliable. Moreover, systems following this paradigm typically use large language models to guide graph traversal and evidence retrieval, leading to challenges similar to those encountered with standard RAG. In this paper, we propose a novel RAG framework that employs the spreading activation algorithm to retrieve information from a corpus of documents interconnected by automatically constructed knowledge graphs, thereby enhancing the performance of large language models on complex tasks such as multi-hop question answering. Experiments show that our method achieves better or comparable performance to iterative RAG methodologies, while also being easily integrable as a plug-and-play module with a wide range of RAG-based approaches. Combining our method with chain-of-thought iterative retrieval yields up to a 39\% absolute gain in answer correctness compared to naive RAG, achieving these results with small open-weight language models and highlighting its effectiveness in resource-constrained settings.

Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems

TL;DR

The paper tackles retrieval shortcomings in multi-hop QA for RAG systems by introducing a spreading-activation–based retrieval framework operating over automatically constructed knowledge graphs. This approach enables associative, multi-step retrieval that complements LLM reasoning without requiring model retraining, and it plugs into existing RAG pipelines. Empirical results on two multi-hop benchmarks show SA-RAG either matches or exceeds iterative RAG baselines, with particularly strong gains when combined with chain-of-thought prompts and using small open-weight LLMs. The work emphasizes plug-and-play integration and resource-efficient reasoning, highlighting practical potential for robust, scalable reasoning over large document corpora.

Abstract

Despite initial successes and a variety of architectures, retrieval-augmented generation (RAG) systems still struggle to reliably retrieve and connect the multi-step evidence required for complicated reasoning tasks. Most of the standard RAG frameworks regard all retrieved information as equally reliable, overlooking the varying credibility and interconnected nature of large textual corpora. GraphRAG approaches offer potential improvement to RAG systems by integrating knowledge graphs, which structure information into nodes and edges, capture entity relationships, and enable multi-step logical traversal. However, GraphRAG is not always an ideal solution as it depends on high-quality graph representations of the corpus, which requires either pre-existing knowledge graphs that are expensive to build and update, or automated graph construction pipelines that are often unreliable. Moreover, systems following this paradigm typically use large language models to guide graph traversal and evidence retrieval, leading to challenges similar to those encountered with standard RAG. In this paper, we propose a novel RAG framework that employs the spreading activation algorithm to retrieve information from a corpus of documents interconnected by automatically constructed knowledge graphs, thereby enhancing the performance of large language models on complex tasks such as multi-hop question answering. Experiments show that our method achieves better or comparable performance to iterative RAG methodologies, while also being easily integrable as a plug-and-play module with a wide range of RAG-based approaches. Combining our method with chain-of-thought iterative retrieval yields up to a 39\% absolute gain in answer correctness compared to naive RAG, achieving these results with small open-weight language models and highlighting its effectiveness in resource-constrained settings.

Paper Structure

This paper contains 19 sections, 3 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: High-level overview of methodology
  • Figure 2: Knowledge graph creation during the indexing phase. The figure shows an example graph constructed from a single document describing the capital cities of European states. The document node is marked in green, entity description nodes are colored orange, and entity nodes are shown in blue.
  • Figure 3: Subgraph fetching step: Orange nodes represent the top-k entity description nodes, while entity nodes are shown in blue. Links of the describes type are colored orange, whereas blue links represent related_to relations connecting entity nodes. Initially activated “seed” entities are indicated by a red border.
  • Figure 4: Spreading activation on the subgraph fetched for the query 3hop2__655849_223623_162182 from MuSiQue dataset. Golden entities are marked in yellow, activated entities in red, unactivated entities in light blue, and activated golden entities in pink.
  • Figure 5: Effect of applying a linear normalization factor to the edge weights of the fetched subgraph on spreading activation dynamics. The first column corresponds to $c = 0.5$, the second to $c = 0.4$, and the third to $c = 0.3$.