Toward General and Robust LLM-enhanced Text-attributed Graph Learning
Zihao Zhang, Xunkai Li, Rong-Hua Li, Bing Zhou, Zhenjun Li, Guoren Wang
TL;DR
This work tackles the lack of a unified, robust framework for LLM-enhanced Text-attributed Graph Learning (TAG) by introducing UltraTAG and its robust instantiation UltraTAG-S. UltraTAG provides a unified pipeline with data augmentation, text encoding, and training mechanisms, while UltraTAG-S adds LLM-based robustness enhancements, resilient representation learning, and a graph-enhanced classifier to cope with node/edge sparsity. Across seven diverse TAG datasets and varying sparsity levels, UltraTAG-S achieves state-of-the-art performance and demonstrates increasing gains as data sparsity grows, indicating strong robustness. The approach combines text propagation, text augmentation, and structure-aware edge reconfiguration with dual-GNN classifiers, offering practical impact for real-world applications where textual attributes are incomplete or sparse.
Abstract
Recent advancements in Large Language Models (LLMs) and the proliferation of Text-Attributed Graphs (TAGs) across various domains have positioned LLM-enhanced TAG learning as a critical research area. By utilizing rich graph descriptions, this paradigm leverages LLMs to generate high-quality embeddings, thereby enhancing the representational capacity of Graph Neural Networks (GNNs). However, the field faces significant challenges: (1) the absence of a unified framework to systematize the diverse optimization perspectives arising from the complex interactions between LLMs and GNNs, and (2) the lack of a robust method capable of handling real-world TAGs, which often suffer from texts and edge sparsity, leading to suboptimal performance. To address these challenges, we propose UltraTAG, a unified pipeline for LLM-enhanced TAG learning. UltraTAG provides a unified comprehensive and domain-adaptive framework that not only organizes existing methodologies but also paves the way for future advancements in the field. Building on this framework, we propose UltraTAG-S, a robust instantiation of UltraTAG designed to tackle the inherent sparsity issues in real-world TAGs. UltraTAG-S employs LLM-based text propagation and text augmentation to mitigate text sparsity, while leveraging LLM-augmented node selection techniques based on PageRank and edge reconfiguration strategies to address edge sparsity. Our extensive experiments demonstrate that UltraTAG-S significantly outperforms existing baselines, achieving improvements of 2.12\% and 17.47\% in ideal and sparse settings, respectively. Moreover, as the data sparsity ratio increases, the performance improvement of UltraTAG-S also rises, which underscores the effectiveness and robustness of UltraTAG-S.
