Table of Contents
Fetching ...

GraphiT: Efficient Node Classification on Text-Attributed Graphs with Prompt Optimized LLMs

Shima Khoshraftar, Niaz Abedini, Amir Hajian

TL;DR

GraphiT tackles node classification on text-attributed graphs by converting neighborhood information into concise textual prompts and automatically optimizing the LLM prompts with the DSPy framework. It introduces neighbor keyphrases as an efficient graph encoding, reducing context length while preserving predictive power, and uses COPRO-based prompt optimization to tailor instructions and demonstrations without extensive manual tuning. Empirical results across Cora, PubMed, and Ogbn-arxiv show GraphiT consistently surpasses vanilla LLM baselines and exceeds several prior LLM-based methods, with PubMed attaining competitive results against GCN; ablations confirm the effectiveness and token-efficiency of neighbor keyphrases. The approach offers a reproducible, scalable pathway for leverage LLMs in graph prediction tasks, highlighting practical gains in both accuracy and cost via principled encoding and automated prompt design.

Abstract

The application of large language models (LLMs) to graph data has attracted a lot of attention recently. LLMs allow us to use deep contextual embeddings from pretrained models in text-attributed graphs, where shallow embeddings are often used for the text attributes of nodes. However, it is still challenging to efficiently encode the graph structure and features into a sequential form for use by LLMs. In addition, the performance of an LLM alone, is highly dependent on the structure of the input prompt, which limits their effectiveness as a reliable approach and often requires iterative manual adjustments that could be slow, tedious and difficult to replicate programmatically. In this paper, we propose GraphiT (Graphs in Text), a framework for encoding graphs into a textual format and optimizing LLM prompts for graph prediction tasks. Here we focus on node classification for text-attributed graphs. We encode the graph data for every node and its neighborhood into a concise text to enable LLMs to better utilize the information in the graph. We then further programmatically optimize the LLM prompts using the DSPy framework to automate this step and make it more efficient and reproducible. GraphiT outperforms our LLM-based baselines on three datasets and we show how the optimization step in GraphiT leads to measurably better results without manual prompt tweaking. We also demonstrated that our graph encoding approach is competitive to other graph encoding methods while being less expensive because it uses significantly less tokens for the same task.

GraphiT: Efficient Node Classification on Text-Attributed Graphs with Prompt Optimized LLMs

TL;DR

GraphiT tackles node classification on text-attributed graphs by converting neighborhood information into concise textual prompts and automatically optimizing the LLM prompts with the DSPy framework. It introduces neighbor keyphrases as an efficient graph encoding, reducing context length while preserving predictive power, and uses COPRO-based prompt optimization to tailor instructions and demonstrations without extensive manual tuning. Empirical results across Cora, PubMed, and Ogbn-arxiv show GraphiT consistently surpasses vanilla LLM baselines and exceeds several prior LLM-based methods, with PubMed attaining competitive results against GCN; ablations confirm the effectiveness and token-efficiency of neighbor keyphrases. The approach offers a reproducible, scalable pathway for leverage LLMs in graph prediction tasks, highlighting practical gains in both accuracy and cost via principled encoding and automated prompt design.

Abstract

The application of large language models (LLMs) to graph data has attracted a lot of attention recently. LLMs allow us to use deep contextual embeddings from pretrained models in text-attributed graphs, where shallow embeddings are often used for the text attributes of nodes. However, it is still challenging to efficiently encode the graph structure and features into a sequential form for use by LLMs. In addition, the performance of an LLM alone, is highly dependent on the structure of the input prompt, which limits their effectiveness as a reliable approach and often requires iterative manual adjustments that could be slow, tedious and difficult to replicate programmatically. In this paper, we propose GraphiT (Graphs in Text), a framework for encoding graphs into a textual format and optimizing LLM prompts for graph prediction tasks. Here we focus on node classification for text-attributed graphs. We encode the graph data for every node and its neighborhood into a concise text to enable LLMs to better utilize the information in the graph. We then further programmatically optimize the LLM prompts using the DSPy framework to automate this step and make it more efficient and reproducible. GraphiT outperforms our LLM-based baselines on three datasets and we show how the optimization step in GraphiT leads to measurably better results without manual prompt tweaking. We also demonstrated that our graph encoding approach is competitive to other graph encoding methods while being less expensive because it uses significantly less tokens for the same task.

Paper Structure

This paper contains 17 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The general framework of GraphiT. First, node features, including neighbors keyphrases, are extracted for each node in the graph. Next, a small subset of nodes, along with an initial prompt, are fed into DSPy to produce an optimized prompt. Finally, node classification is performed using the optimized prompt.
  • Figure 2: Histogram of the ratio of the number of tokens for summaries to those obtained from the KPE approach. The KPE method applied to the node neighbors results in significantly less tokens compared to the summarization method with minimal impact on the quality of the classification results. This reduction translates to lower LLM API costs by making the input context length considerably shorter.

Theorems & Definitions (2)

  • Definition 1
  • Definition 2