Similarity-based Neighbor Selection for Graph LLMs
Rui Li, Jiwei Li, Jiawei Han, Guoyin Wang
TL;DR
This work introduces Similarity-based Neighbor Selection (SNS), a prompt-based approach to node classification in text-attributed graphs that leverages LLMs without training. SNS consists of Recursive Neighbor Selection to gather labeled neighbors and a SimCSE-driven Similarity-based Ranking to choose the most informative neighbors for inclusion in prompts, addressing over-squashing and heterophily. Across five benchmarks, SNS consistently surpasses vanilla GNNs and prior prompt-based methods, achieving state-of-the-art results on PubMed and demonstrating strong zero-shot generalization and scalability. The study highlights the important role of graph structure integration for enabling LLMs to reason over graph-structured data and provides evidence that well-crafted neighbor selection can unlock substantial performance gains in TAGs.
Abstract
Text-attributed graphs (TAGs) present unique challenges for direct processing by Language Learning Models (LLMs), yet their extensive commonsense knowledge and robust reasoning capabilities offer great promise for node classification in TAGs. Prior research in this field has grappled with issues such as over-squashing, heterophily, and ineffective graph information integration, further compounded by inconsistencies in dataset partitioning and underutilization of advanced LLMs. To address these challenges, we introduce Similarity-based Neighbor Selection (SNS). Using SimCSE and advanced neighbor selection techniques, SNS effectively improves the quality of selected neighbors, thereby improving graph representation and alleviating issues like over-squashing and heterophily. Besides, as an inductive and training-free approach, SNS demonstrates superior generalization and scalability over traditional GNN methods. Our comprehensive experiments, adhering to standard dataset partitioning practices, demonstrate that SNS, through simple prompt interactions with LLMs, consistently outperforms vanilla GNNs and achieves state-of-the-art results on datasets like PubMed in node classification, showcasing LLMs' potential in graph structure understanding. Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification. Code is available at https://github.com/ruili33/SNS.
