Rethinking Graph Structure Learning in the Era of LLMs
Zhihan Zhang, Xunkai Li, Zhu Lei, Guang Zeng, Ronghua Li, Guoren Wang
TL;DR
This work rethinks graph structure learning (GSL) for text-attributed graphs (TAGs) in the era of large language models (LLMs) by introducing LLaTA, a training-free, decoupled framework that reframes GSL as a language-guided tree optimization task. It constructs a topology-aware structural encoding tree via structural entropy minimization, then uses tree-prompted LLM in-context inference with a Community of Thought mechanism to jointly understand topology and node text. A leaf-oriented two-step sampling procedure guides training-free graph refinement, achieving state-of-the-art performance across 11 TAG datasets while avoiding costly fine-tuning. The method demonstrates strong robustness and efficiency, scalable to large graphs, and provides a practical paradigm for integrating LLMs with GSL in real-world TAG applications.
Abstract
Recently, the emergence of LLMs has prompted researchers to integrate language descriptions into graphs, aiming to enhance model encoding capabilities from a data-centric perspective. This graph representation is called text-attributed graphs (TAGs). A review of prior advancements highlights that graph structure learning (GSL) is a pivotal technique for improving data utility, making it highly relevant to efficient TAG learning. However, most GSL methods are tailored for traditional graphs without textual information, underscoring the necessity of developing a new GSL paradigm. Despite clear motivations, it remains challenging: (1) How can we define a reasonable optimization objective for GSL in the era of LLMs, considering the massive parameters in LLM? (2) How can we design an efficient model architecture that enables seamless integration of LLM for this optimization objective? For Question 1, we reformulate existing GSL optimization objectives as a tree optimization framework, shifting the focus from obtaining a well-trained edge predictor to a language-aware tree sampler. For Question 2, we propose decoupled and training-free model design principles for LLM integration, shifting the focus from computation-intensive fine-tuning to more efficient inference. Based on this, we propose Large Language and Tree Assistant (LLaTA), which leverages tree-based LLM in-context learning to enhance the understanding of topology and text, enabling reliable inference and generating improved graph structure. Extensive experiments on 11 datasets demonstrate that LLaTA enjoys flexibility-incorporated with any backbone; scalability-outperforms other LLM-enhanced graph learning methods; effectiveness-achieves SOTA predictive performance.
