DA-RAG: Dynamic Attributed Community Search for Retrieval-Augmented Generation
Xingyuan Zeng, Zuohan Wu, Yue Wang, Chen Zhang, Quanming Yao, Libin Zheng, Jian Yin
TL;DR
DA-RAG tackles limitations of static, low-order graph approaches in retrieval-augmented generation by introducing Embedding-Attributed Community Search (EACS) to dynamically extract high-order, query-relevant subgraphs. It pairs an offline chunk-layer based index with online coarse-to-fine retrieval across three layers (Semantic Chunk Layer, Knowledge Graph Layer, Similarity Layer) to achieve provable cohesion and bounded reasoning hops via a $k$-truss constraint and a query relevance objective. The EACS-based online module is solved via the Q-Peel heuristic, with complexity $O(m^{1.5} + c n^2 t)$ and robustness analyzed, while experiments on UltraDomain and News Articles show DA-RAG outperforms baselines by up to 40% in key metrics and reduces indexing and token costs by significant margins. Overall, DA-RAG demonstrates that dynamic, multi-layered, high-order graph structures can substantially improve retrieval quality and efficiency for RAG systems, with practical impact for search, summarization, and knowledge-grounded AI applications.
Abstract
Owing to their unprecedented comprehension capabilities, large language models (LLMs) have become indispensable components of modern web search engines. From a technical perspective, this integration represents retrieval-augmented generation (RAG), which enhances LLMs by grounding them in external knowledge bases. A prevalent technical approach in this context is graph-based RAG (G-RAG). However, current G-RAG methodologies frequently underutilize graph topology, predominantly focusing on low-order structures or pre-computed static communities. This limitation affects their effectiveness in addressing dynamic and complex queries. Thus, we propose DA-RAG, which leverages attributed community search (ACS) to extract relevant subgraphs based on the queried question dynamically. DA-RAG captures high-order graph structures, allowing for the retrieval of self-complementary knowledge. Furthermore, DA-RAG is equipped with a chunk-layer oriented graph index, which facilitates efficient multi-granularity retrieval while significantly reducing both computational and economic costs. We evaluate DA-RAG on multiple datasets, demonstrating that it outperforms existing RAG methods by up to 40% in head-to-head comparisons across four metrics while reducing index construction time and token overhead by up to 37% and 41%, respectively.
