Table of Contents
Fetching ...

RAGraph: A General Retrieval-Augmented Graph Learning Framework

Xinke Jiang, Rihong Qiu, Yongxin Xu, Wentao Zhang, Yichen Zhu, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang

TL;DR

RAGraph addresses the generalization gap of Graph Neural Networks by introducing a Retrieval-Augmented Graph Learning framework that imports external toy-graph knowledge via a dynamic key-value library. It builds a toy-graph vector base from a resource graph, retrieves top-k toy graphs by a multi-faceted similarity function, and propagates their hidden embeddings and task-specific outputs into a pre-trained GNN through intra- and inter-propagation prompts. The approach yields tuning-free improvements across node, edge, and graph tasks on static and dynamic datasets, outperforming state-of-the-art baselines and showing robustness to unseen data, especially with noise-based prompt tuning. This work broadens the applicability of RAG to structured graph data, offering a plug-and-play paradigm for enhancing generalization in Large Graph Models with potential impact on cross-domain graph learning and knowledge integration.

Abstract

Graph Neural Networks (GNNs) have become essential in interpreting relational data across various domains, yet, they often struggle to generalize to unseen graph data that differs markedly from training instances. In this paper, we introduce a novel framework called General Retrieval-Augmented Graph Learning (RAGraph), which brings external graph data into the general graph foundation model to improve model generalization on unseen scenarios. On the top of our framework is a toy graph vector library that we established, which captures key attributes, such as features and task-specific label information. During inference, the RAGraph adeptly retrieves similar toy graphs based on key similarities in downstream tasks, integrating the retrieved data to enrich the learning context via the message-passing prompting mechanism. Our extensive experimental evaluations demonstrate that RAGraph significantly outperforms state-of-the-art graph learning methods in multiple tasks such as node classification, link prediction, and graph classification across both dynamic and static datasets. Furthermore, extensive testing confirms that RAGraph consistently maintains high performance without the need for task-specific fine-tuning, highlighting its adaptability, robustness, and broad applicability.

RAGraph: A General Retrieval-Augmented Graph Learning Framework

TL;DR

RAGraph addresses the generalization gap of Graph Neural Networks by introducing a Retrieval-Augmented Graph Learning framework that imports external toy-graph knowledge via a dynamic key-value library. It builds a toy-graph vector base from a resource graph, retrieves top-k toy graphs by a multi-faceted similarity function, and propagates their hidden embeddings and task-specific outputs into a pre-trained GNN through intra- and inter-propagation prompts. The approach yields tuning-free improvements across node, edge, and graph tasks on static and dynamic datasets, outperforming state-of-the-art baselines and showing robustness to unseen data, especially with noise-based prompt tuning. This work broadens the applicability of RAG to structured graph data, offering a plug-and-play paradigm for enhancing generalization in Large Graph Models with potential impact on cross-domain graph learning and knowledge integration.

Abstract

Graph Neural Networks (GNNs) have become essential in interpreting relational data across various domains, yet, they often struggle to generalize to unseen graph data that differs markedly from training instances. In this paper, we introduce a novel framework called General Retrieval-Augmented Graph Learning (RAGraph), which brings external graph data into the general graph foundation model to improve model generalization on unseen scenarios. On the top of our framework is a toy graph vector library that we established, which captures key attributes, such as features and task-specific label information. During inference, the RAGraph adeptly retrieves similar toy graphs based on key similarities in downstream tasks, integrating the retrieved data to enrich the learning context via the message-passing prompting mechanism. Our extensive experimental evaluations demonstrate that RAGraph significantly outperforms state-of-the-art graph learning methods in multiple tasks such as node classification, link prediction, and graph classification across both dynamic and static datasets. Furthermore, extensive testing confirms that RAGraph consistently maintains high performance without the need for task-specific fine-tuning, highlighting its adaptability, robustness, and broad applicability.

Paper Structure

This paper contains 57 sections, 15 equations, 6 figures, 5 tables, 2 algorithms.

Figures (6)

  • Figure 1: (a) RAG in NLP utilizes retrieval to enhance model responses, based on a query to retrieve related features (e.g.,a tail, primarily feeds on mice) and answers (e.g.,Cat). (b) In CV, RAG employs similar photo retrieval to enhance model comprehension, assisting in downstream tasks such as inpainting or image question answering. (c) For GNNs, RAG could leverage retrieval of similar historical subgraphs or scenarios to aid in graph-based tasks (e.g., recommendations or fraud detection).
  • Figure 2: The overall framework of RAGraph. ❶ Given resource graph ${\mathcal{G}}^{\mathcal{R}}$, we chunk it and augment toy graphs $\{G^{\mathcal{T}}\}$, and feed them into pre-trained GNNs to generate hidden embeddings via the encoder and task-specific output vectors via decoder, which are stored as values. Keys such as environment, history, position-aware, and hidden embeddings are stored to form the key-value database of toy graphs $\mathcal{G}^{\mathcal{T}}$. ❷ For a given query graph $G^{\mathcal{Q}}$, the keys are fetched to retrieve the $topK$ toy graphs $G_{\text{topK}}^\mathcal{T}$ from the database. ❸ Leveraging $G_{\text{topK}}^\mathcal{T}$, intra- and inter-propagation are performed to propagate hidden embeddings and task-specific output vectors to pass retrieved knowledge to center node $v_c$. Through a weighted fusion, the aggregated output is used to perform graph-, node- and edge-level tasks.
  • Figure 3: Hyper-parameter study with hops$k$ (Left) from 1 to 5 and$topk$ from 1 to 20 (Right) on node classification with PROTEINS, and ENZYMES datasets with the setting in Table \ref{['table.combined-classification']}.
  • Figure 4: Performance comparisons of RAGraph and several baselines with different proportions of training and resource data.
  • Figure 5: Qualitative analyses of toy graphs retrieving -- how “generation” works.
  • ...and 1 more figures