Table of Contents
Fetching ...

DA-RAG: Dynamic Attributed Community Search for Retrieval-Augmented Generation

Xingyuan Zeng, Zuohan Wu, Yue Wang, Chen Zhang, Quanming Yao, Libin Zheng, Jian Yin

TL;DR

DA-RAG tackles limitations of static, low-order graph approaches in retrieval-augmented generation by introducing Embedding-Attributed Community Search (EACS) to dynamically extract high-order, query-relevant subgraphs. It pairs an offline chunk-layer based index with online coarse-to-fine retrieval across three layers (Semantic Chunk Layer, Knowledge Graph Layer, Similarity Layer) to achieve provable cohesion and bounded reasoning hops via a $k$-truss constraint and a query relevance objective. The EACS-based online module is solved via the Q-Peel heuristic, with complexity $O(m^{1.5} + c n^2 t)$ and robustness analyzed, while experiments on UltraDomain and News Articles show DA-RAG outperforms baselines by up to 40% in key metrics and reduces indexing and token costs by significant margins. Overall, DA-RAG demonstrates that dynamic, multi-layered, high-order graph structures can substantially improve retrieval quality and efficiency for RAG systems, with practical impact for search, summarization, and knowledge-grounded AI applications.

Abstract

Owing to their unprecedented comprehension capabilities, large language models (LLMs) have become indispensable components of modern web search engines. From a technical perspective, this integration represents retrieval-augmented generation (RAG), which enhances LLMs by grounding them in external knowledge bases. A prevalent technical approach in this context is graph-based RAG (G-RAG). However, current G-RAG methodologies frequently underutilize graph topology, predominantly focusing on low-order structures or pre-computed static communities. This limitation affects their effectiveness in addressing dynamic and complex queries. Thus, we propose DA-RAG, which leverages attributed community search (ACS) to extract relevant subgraphs based on the queried question dynamically. DA-RAG captures high-order graph structures, allowing for the retrieval of self-complementary knowledge. Furthermore, DA-RAG is equipped with a chunk-layer oriented graph index, which facilitates efficient multi-granularity retrieval while significantly reducing both computational and economic costs. We evaluate DA-RAG on multiple datasets, demonstrating that it outperforms existing RAG methods by up to 40% in head-to-head comparisons across four metrics while reducing index construction time and token overhead by up to 37% and 41%, respectively.

DA-RAG: Dynamic Attributed Community Search for Retrieval-Augmented Generation

TL;DR

DA-RAG tackles limitations of static, low-order graph approaches in retrieval-augmented generation by introducing Embedding-Attributed Community Search (EACS) to dynamically extract high-order, query-relevant subgraphs. It pairs an offline chunk-layer based index with online coarse-to-fine retrieval across three layers (Semantic Chunk Layer, Knowledge Graph Layer, Similarity Layer) to achieve provable cohesion and bounded reasoning hops via a -truss constraint and a query relevance objective. The EACS-based online module is solved via the Q-Peel heuristic, with complexity and robustness analyzed, while experiments on UltraDomain and News Articles show DA-RAG outperforms baselines by up to 40% in key metrics and reduces indexing and token costs by significant margins. Overall, DA-RAG demonstrates that dynamic, multi-layered, high-order graph structures can substantially improve retrieval quality and efficiency for RAG systems, with practical impact for search, summarization, and knowledge-grounded AI applications.

Abstract

Owing to their unprecedented comprehension capabilities, large language models (LLMs) have become indispensable components of modern web search engines. From a technical perspective, this integration represents retrieval-augmented generation (RAG), which enhances LLMs by grounding them in external knowledge bases. A prevalent technical approach in this context is graph-based RAG (G-RAG). However, current G-RAG methodologies frequently underutilize graph topology, predominantly focusing on low-order structures or pre-computed static communities. This limitation affects their effectiveness in addressing dynamic and complex queries. Thus, we propose DA-RAG, which leverages attributed community search (ACS) to extract relevant subgraphs based on the queried question dynamically. DA-RAG captures high-order graph structures, allowing for the retrieval of self-complementary knowledge. Furthermore, DA-RAG is equipped with a chunk-layer oriented graph index, which facilitates efficient multi-granularity retrieval while significantly reducing both computational and economic costs. We evaluate DA-RAG on multiple datasets, demonstrating that it outperforms existing RAG methods by up to 40% in head-to-head comparisons across four metrics while reducing index construction time and token overhead by up to 37% and 41%, respectively.
Paper Structure (25 sections, 25 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 25 sections, 25 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Differences between existing methods and our method. (a) Methods w/o community concern are limited to low-order graph topology, capturing only partial aspects. (b) Methods with static community partition could return a diverging and unfocused response. (c) Our method retrieves a query-relevant subgraph tailored to the question’s need.
  • Figure 2: Overview of the DA-RAG framework: (a) Offline Indexing creates a novel graph index from source documents, comprising a high-level layer ($L_C$) and two granular layers ($L_{KG}$ and $L_S$). (b) Online Retrieval employs a coarse-to-fine strategy. (c) EACS Formulation defines the subgraph retrieval in G-RAG as the Embedding-Attributed Community Search (EACS), ensuring provable cohesion, bounded reasoning hops, and mitigates free-rider effects.
  • Figure 3: An illustration mapping the G-RAG subgraph retrieval task to the Attributed Community Search problem.
  • Figure 4: Efficiency comparison. DA-RAG is more efficient than ArchRAG and GraphRAG, while surpassing LightRAG in terms of effectiveness when given comparable efficiency.
  • Figure 5: Analysis of retrieved subgraph quality on the Agriculture dataset. DA-RAG's subgraphs exhibit the superior structural cohesiveness (highest density and lowest diameter) and semantic relevance (highest QRScore and similarity) compared to all baseline methods.
  • ...and 1 more figures