Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Bryan Perozzi; Bahare Fatemi; Dustin Zelle; Anton Tsitsulin; Mehran Kazemi; Rami Al-Rfou; Jonathan Halcrow

Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, Jonathan Halcrow

TL;DR

This paper tackles the problem of efficiently encoding structured graph data for large language models. It introduces GraphToken, a parameter-efficient encoder that produces a small set of graph tokens via a GNN and projects them into the LLM embedding space, with the LLM kept frozen during training. Through extensive experiments on the GraphQA benchmark, GraphToken substantially improves performance across graph-level, node-level, and edge-level tasks, and analyses show encoder choice and feature design critically influence results. The work demonstrates that explicit graph representations in the prompt space can significantly enhance reasoning with LLMs, offering a practical approach to leveraging structured data without large-scale fine-tuning.

Abstract

How can we best encode structured data into sequential form for use in large language models (LLMs)? In this work, we introduce a parameter-efficient method to explicitly represent structured data for LLMs. Our method, GraphToken, learns an encoding function to extend prompts with explicit structured information. Unlike other work which focuses on limited domains (e.g. knowledge graph representation), our work is the first effort focused on the general encoding of structured data to be used for various reasoning tasks. We show that explicitly representing the graph structure allows significant improvements to graph reasoning tasks. Specifically, we see across the board improvements - up to 73% points - on node, edge and, graph-level tasks from the GraphQA benchmark.

Let Your Graph Do the Talking: Encoding Structured Data for LLMs

TL;DR

Abstract

Paper Structure (26 sections, 2 equations, 5 figures, 15 tables)

This paper contains 26 sections, 2 equations, 5 figures, 15 tables.

Introduction
Background
Large Language Models
Pre-Trained Large Language Models (LLMs):
Parameter-Efficient Fine-Tuning:
Graph Encoding with Neural Networks
Graphs and LLMs
GraphToken
Architecture
Training procedure
Experiments
Datasets.
Setting.
Experiment 1: GraphToken Performance
Results.
...and 11 more sections

Figures (5)

Figure 1: Graph encoding options for a frozen LLM. a) Fixed encoding, e.g., fatemi2023talkwang2023canstechly2023gpt, b) This work proposes using GraphToken, a learned graph prompt function to explicitly encode graphs in a parameter efficient way.
Figure 2: A visual overview of the architecture of GraphToken. The framework takes a graph and a corresponding question as input. The graph encoder takes the graph and generates graph tokens. The question is tokenized and embedded to question tokens. A frozen LLM leverages the graph and question tokens to generate an answer.
Figure 3: Effect of varying node features used in the graph encoder. Results shown are performance difference from the Soft Prompt baseline on GraphQA$_{\text{Test}}$. We see that breaking equivariance via learned features (Fig. \ref{['fig:node_features-idx']}) generally improve the model performance, but the combination of learned and spectral features (Fig. \ref{['fig:node_features-lpe_idx']}) proves uniquely powerful for some encoders.
Figure 4: UMAP mcinnes2018umap projection of GraphToken embeddings produced by two different encoders, colored by the diameter of a graph. We plot all 8-node graphs.
Figure 5: Figurative illustrations of set-based GNN architectures employed in the paper. We pool representations from either nodes or edges, transform them via an MLP with shared weights, pool, and project to the GraphToken space.

Let Your Graph Do the Talking: Encoding Structured Data for LLMs

TL;DR

Abstract

Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (5)