Table of Contents
Fetching ...

Similarity-based Neighbor Selection for Graph LLMs

Rui Li, Jiwei Li, Jiawei Han, Guoyin Wang

TL;DR

This work introduces Similarity-based Neighbor Selection (SNS), a prompt-based approach to node classification in text-attributed graphs that leverages LLMs without training. SNS consists of Recursive Neighbor Selection to gather labeled neighbors and a SimCSE-driven Similarity-based Ranking to choose the most informative neighbors for inclusion in prompts, addressing over-squashing and heterophily. Across five benchmarks, SNS consistently surpasses vanilla GNNs and prior prompt-based methods, achieving state-of-the-art results on PubMed and demonstrating strong zero-shot generalization and scalability. The study highlights the important role of graph structure integration for enabling LLMs to reason over graph-structured data and provides evidence that well-crafted neighbor selection can unlock substantial performance gains in TAGs.

Abstract

Text-attributed graphs (TAGs) present unique challenges for direct processing by Language Learning Models (LLMs), yet their extensive commonsense knowledge and robust reasoning capabilities offer great promise for node classification in TAGs. Prior research in this field has grappled with issues such as over-squashing, heterophily, and ineffective graph information integration, further compounded by inconsistencies in dataset partitioning and underutilization of advanced LLMs. To address these challenges, we introduce Similarity-based Neighbor Selection (SNS). Using SimCSE and advanced neighbor selection techniques, SNS effectively improves the quality of selected neighbors, thereby improving graph representation and alleviating issues like over-squashing and heterophily. Besides, as an inductive and training-free approach, SNS demonstrates superior generalization and scalability over traditional GNN methods. Our comprehensive experiments, adhering to standard dataset partitioning practices, demonstrate that SNS, through simple prompt interactions with LLMs, consistently outperforms vanilla GNNs and achieves state-of-the-art results on datasets like PubMed in node classification, showcasing LLMs' potential in graph structure understanding. Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification. Code is available at https://github.com/ruili33/SNS.

Similarity-based Neighbor Selection for Graph LLMs

TL;DR

This work introduces Similarity-based Neighbor Selection (SNS), a prompt-based approach to node classification in text-attributed graphs that leverages LLMs without training. SNS consists of Recursive Neighbor Selection to gather labeled neighbors and a SimCSE-driven Similarity-based Ranking to choose the most informative neighbors for inclusion in prompts, addressing over-squashing and heterophily. Across five benchmarks, SNS consistently surpasses vanilla GNNs and prior prompt-based methods, achieving state-of-the-art results on PubMed and demonstrating strong zero-shot generalization and scalability. The study highlights the important role of graph structure integration for enabling LLMs to reason over graph-structured data and provides evidence that well-crafted neighbor selection can unlock substantial performance gains in TAGs.

Abstract

Text-attributed graphs (TAGs) present unique challenges for direct processing by Language Learning Models (LLMs), yet their extensive commonsense knowledge and robust reasoning capabilities offer great promise for node classification in TAGs. Prior research in this field has grappled with issues such as over-squashing, heterophily, and ineffective graph information integration, further compounded by inconsistencies in dataset partitioning and underutilization of advanced LLMs. To address these challenges, we introduce Similarity-based Neighbor Selection (SNS). Using SimCSE and advanced neighbor selection techniques, SNS effectively improves the quality of selected neighbors, thereby improving graph representation and alleviating issues like over-squashing and heterophily. Besides, as an inductive and training-free approach, SNS demonstrates superior generalization and scalability over traditional GNN methods. Our comprehensive experiments, adhering to standard dataset partitioning practices, demonstrate that SNS, through simple prompt interactions with LLMs, consistently outperforms vanilla GNNs and achieves state-of-the-art results on datasets like PubMed in node classification, showcasing LLMs' potential in graph structure understanding. Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification. Code is available at https://github.com/ruili33/SNS.
Paper Structure (25 sections, 7 figures, 10 tables)

This paper contains 25 sections, 7 figures, 10 tables.

Figures (7)

  • Figure 1: An overview of SNS and the prompt for LLMs. Top 2 Similar Neighbors are ranked and selected according to similarity determined by SimCSE gao2021simcse, and subsequently incorporated into the prompts sequentially from most to least related.
  • Figure 2: Top-k neighbors accuracy of SNS, random neighbor selection, and LLMs' vanilla zero-shot accuracy across 3 datasets.
  • Figure 3: Comparison between Recursive Neighbor Selection and Direct Selection across different hops ($\gamma$).
  • Figure 4: The results of SNS and 2-hop random neighbor selection for Cora across different numbers of neighbors (k).
  • Figure 5: The performance of SNS across LLMs with varying capabilities on Cora.
  • ...and 2 more figures