Table of Contents
Fetching ...

DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems

Wenqing Zhou, Yuxuan Yan, Qianqian Yang

TL;DR

This work tackles the privacy, latency, and scalability challenges of centralized Retrieval-Augmented Generation by introducing DGRAG, a distributed RAG system that leverages edge-local knowledge graphs and cloud-hosted subgraph summaries. It combines a two-phase process—the Distributed Knowledge Construction phase, which builds and summarizes local KGs, with a Collaborative Retrieval and Generation phase that uses a gating mechanism to decide when cross-edge cloud retrieval is needed. Key contributions include Leiden-based subgraph partitioning, privacy-preserving subgraph summaries, and a gate-driven cross-edge retrieval pipeline that enables precise global answers when local knowledge is insufficient. Empirical results on the UltraDomain benchmark show that DGRAG outperforms Naïve RAG and Local RAG in both within-domain and cross-domain questions, while ablation studies validate the importance of the gate mechanism and graph-based retrieval. The approach offers a scalable, privacy-conscious framework for real-world edge-cloud deployments in domains like smart manufacturing and intelligent city systems.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a promising approach to enhance the capabilities of language models by integrating external knowledge. Due to the diversity of data sources and the constraints of memory and computing resources, real-world data is often scattered in multiple devices. Conventional RAGs that store massive amounts of scattered data centrally face increasing privacy concerns and high computational costs. Additionally, RAG in a central node raises latency issues when searching over a large-scale knowledge base. To address these challenges, we propose a distributed Knowledge Graph-based RAG approach, referred to as DGRAG, in an edge-cloud system, where each edge device maintains a local knowledge base without the need to share it with the cloud, instead sharing only summaries of its knowledge. Specifically, DGRAG has two main phases. In the Distributed Knowledge Construction phase, DGRAG organizes local knowledge using knowledge graphs, generating subgraph summaries and storing them in a summary database in the cloud as information sharing. In the Collaborative Retrieval and Generation phase, DGRAG first performs knowledge retrieval and answer generation locally, and a gate mechanism determines whether the query is beyond the scope of local knowledge or processing capabilities. For queries that exceed the local knowledge scope, the cloud retrieves knowledge from the most relevant edges based on the summaries and generates a more precise answer. Experimental results demonstrate the effectiveness of the proposed DGRAG approach in significantly improving the quality of question-answering tasks over baseline approaches.

DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems

TL;DR

This work tackles the privacy, latency, and scalability challenges of centralized Retrieval-Augmented Generation by introducing DGRAG, a distributed RAG system that leverages edge-local knowledge graphs and cloud-hosted subgraph summaries. It combines a two-phase process—the Distributed Knowledge Construction phase, which builds and summarizes local KGs, with a Collaborative Retrieval and Generation phase that uses a gating mechanism to decide when cross-edge cloud retrieval is needed. Key contributions include Leiden-based subgraph partitioning, privacy-preserving subgraph summaries, and a gate-driven cross-edge retrieval pipeline that enables precise global answers when local knowledge is insufficient. Empirical results on the UltraDomain benchmark show that DGRAG outperforms Naïve RAG and Local RAG in both within-domain and cross-domain questions, while ablation studies validate the importance of the gate mechanism and graph-based retrieval. The approach offers a scalable, privacy-conscious framework for real-world edge-cloud deployments in domains like smart manufacturing and intelligent city systems.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a promising approach to enhance the capabilities of language models by integrating external knowledge. Due to the diversity of data sources and the constraints of memory and computing resources, real-world data is often scattered in multiple devices. Conventional RAGs that store massive amounts of scattered data centrally face increasing privacy concerns and high computational costs. Additionally, RAG in a central node raises latency issues when searching over a large-scale knowledge base. To address these challenges, we propose a distributed Knowledge Graph-based RAG approach, referred to as DGRAG, in an edge-cloud system, where each edge device maintains a local knowledge base without the need to share it with the cloud, instead sharing only summaries of its knowledge. Specifically, DGRAG has two main phases. In the Distributed Knowledge Construction phase, DGRAG organizes local knowledge using knowledge graphs, generating subgraph summaries and storing them in a summary database in the cloud as information sharing. In the Collaborative Retrieval and Generation phase, DGRAG first performs knowledge retrieval and answer generation locally, and a gate mechanism determines whether the query is beyond the scope of local knowledge or processing capabilities. For queries that exceed the local knowledge scope, the cloud retrieves knowledge from the most relevant edges based on the summaries and generates a more precise answer. Experimental results demonstrate the effectiveness of the proposed DGRAG approach in significantly improving the quality of question-answering tasks over baseline approaches.

Paper Structure

This paper contains 21 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of the DGRAG framework, which includes two main phases: Distributed Knowledge Graph Construction (left) and Collaborative Retrieval and Generation (right).
  • Figure 2: Hit rate of subgraph summary matching versus the value of $m$ in top-$m$ subgraph summary matching.
  • Figure 3: (a) Total time cost on the Q$\&$A task by different RAG approaches; (b) The time cost of each phase in DGRAG. Cross-edge retrieval mechanism consists of summary matching, knowledge retrieval, cloud generation and several data transmission. DGRAG(local) denotes the case that the gate mechanism determines that local knowledge is adequate and there is no need for cross-edge retrieval and generation. DGRAG(global) denotes the case where the gate mechanism determines that local knowledge is insufficient and cross-edge retrieval and generation is required. DGRAG(avg) is the average time spent on all Q$\&$A tasks operated by DGRAG.
  • Figure 4: A Real Case of the subgraph summarization of the Distributed Knowledge Graph Construction from the CS domain. It describes the operation from information extraction and knowledge graph construction to subgraph partitioning and summarization.
  • Figure 5: A case of how DGRAG operates the Q$\&$A task in adequate local knowledge. The gate mechanism determines that the local responses have a high degree of similarity, whereby local knowledge is considered sufficient and the best local answer is selected as the final response. The whole workflow processes only at the querying edge.
  • ...and 1 more figures