Leveraging Spreading Activation for Improved Document Retrieval in Knowledge-Graph-Based RAG Systems
Jovan Pavlović, Miklós Krész, László Hajdu
TL;DR
The paper tackles retrieval shortcomings in multi-hop QA for RAG systems by introducing a spreading-activation–based retrieval framework operating over automatically constructed knowledge graphs. This approach enables associative, multi-step retrieval that complements LLM reasoning without requiring model retraining, and it plugs into existing RAG pipelines. Empirical results on two multi-hop benchmarks show SA-RAG either matches or exceeds iterative RAG baselines, with particularly strong gains when combined with chain-of-thought prompts and using small open-weight LLMs. The work emphasizes plug-and-play integration and resource-efficient reasoning, highlighting practical potential for robust, scalable reasoning over large document corpora.
Abstract
Despite initial successes and a variety of architectures, retrieval-augmented generation (RAG) systems still struggle to reliably retrieve and connect the multi-step evidence required for complicated reasoning tasks. Most of the standard RAG frameworks regard all retrieved information as equally reliable, overlooking the varying credibility and interconnected nature of large textual corpora. GraphRAG approaches offer potential improvement to RAG systems by integrating knowledge graphs, which structure information into nodes and edges, capture entity relationships, and enable multi-step logical traversal. However, GraphRAG is not always an ideal solution as it depends on high-quality graph representations of the corpus, which requires either pre-existing knowledge graphs that are expensive to build and update, or automated graph construction pipelines that are often unreliable. Moreover, systems following this paradigm typically use large language models to guide graph traversal and evidence retrieval, leading to challenges similar to those encountered with standard RAG. In this paper, we propose a novel RAG framework that employs the spreading activation algorithm to retrieve information from a corpus of documents interconnected by automatically constructed knowledge graphs, thereby enhancing the performance of large language models on complex tasks such as multi-hop question answering. Experiments show that our method achieves better or comparable performance to iterative RAG methodologies, while also being easily integrable as a plug-and-play module with a wide range of RAG-based approaches. Combining our method with chain-of-thought iterative retrieval yields up to a 39\% absolute gain in answer correctness compared to naive RAG, achieving these results with small open-weight language models and highlighting its effectiveness in resource-constrained settings.
