Augmenting Textual Generation via Topology Aware Retrieval
Yu Wang, Nedim Lipka, Ruiyi Zhang, Alexa Siu, Yuying Zhao, Bo Ni, Xin Wang, Ryan Rossi, Tyler Derr
TL;DR
This work tackles the problem of LLM hallucinations and limited input knowledge by introducing Topology-aware Retrieval-Augmented Generation (Topo-RAG), which guides retrieval using topology encoded in proximity-based and role-based relations. It demonstrates that additional, topologically similar texts can meaningfully improve generated content, and that textual similarity correlates with topological similarity across multiple domains. The framework precomputes topology embeddings to enable fast retrieval and shows strong gains on traditional text-generation metrics as well as task-oriented evaluations like node classification and link prediction. The results highlight the practical value of incorporating graph topology into RAG to improve factual grounding and writing quality in diverse text-attributed networks.
Abstract
Despite the impressive advancements of Large Language Models (LLMs) in generating text, they are often limited by the knowledge contained in the input and prone to producing inaccurate or hallucinated content. To tackle these issues, Retrieval-augmented Generation (RAG) is employed as an effective strategy to enhance the available knowledge base and anchor the responses in reality by pulling additional texts from external databases. In real-world applications, texts are often linked through entities within a graph, such as citations in academic papers or comments in social networks. This paper exploits these topological relationships to guide the retrieval process in RAG. Specifically, we explore two kinds of topological connections: proximity-based, focusing on closely connected nodes, and role-based, which looks at nodes sharing similar subgraph structures. Our empirical research confirms their relevance to text relationships, leading us to develop a Topology-aware Retrieval-augmented Generation framework. This framework includes a retrieval module that selects texts based on their topological relationships and an aggregation module that integrates these texts into prompts to stimulate LLMs for text generation. We have curated established text-attributed networks and conducted comprehensive experiments to validate the effectiveness of this framework, demonstrating its potential to enhance RAG with topological awareness.
