Large Language Models as Topological Structure Enhancers for Text-Attributed Graphs
Shengyin Sun, Yuxiang Ren, Jiehao Chen, Chen Ma
TL;DR
This work addresses the fragility of graph neural networks on text-attributed graphs by leveraging large language models to refine graph topology. It introduces two LLM-driven topology refinement mechanisms—edge deletion/addition based on semantic similarity and pseudo-label propagation to guide edge weighting—and integrates them with LM fine-tuning and end-to-end GNN training, supported by shrinking-theory-based justification. The approach yields consistent, though modest, performance gains (approximately $0.15\%$ to $2.47\%$) across four real-world TAG datasets and various baselines, demonstrating robustness and practical impact. By showing that topology-aware LLM assistance can improve discrimination and edge weighting, the work highlights a promising direction for building more reliable graph representations in text-rich domains.
Abstract
The latest advancements in large language models (LLMs) have revolutionized the field of natural language processing (NLP). Inspired by the success of LLMs in NLP tasks, some recent work has begun investigating the potential of applying LLMs in graph learning tasks. However, most of the existing work focuses on utilizing LLMs as powerful node feature augmenters, leaving employing LLMs to enhance graph topological structures an understudied problem. In this work, we explore how to leverage the information retrieval and text generation capabilities of LLMs to refine/enhance the topological structure of text-attributed graphs (TAGs) under the node classification setting. First, we propose using LLMs to help remove unreliable edges and add reliable ones in the TAG. Specifically, we first let the LLM output the semantic similarity between node attributes through delicate prompt designs, and then perform edge deletion and edge addition based on the similarity. Second, we propose using pseudo-labels generated by the LLM to improve graph topology, that is, we introduce the pseudo-label propagation as a regularization to guide the graph neural network (GNN) in learning proper edge weights. Finally, we incorporate the two aforementioned LLM-based methods for graph topological refinement into the process of GNN training, and perform extensive experiments on four real-world datasets. The experimental results demonstrate the effectiveness of LLM-based graph topology refinement (achieving a 0.15%--2.47% performance gain on public benchmarks).
