Let Your Graph Do the Talking: Encoding Structured Data for LLMs
Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, Jonathan Halcrow
TL;DR
This paper tackles the problem of efficiently encoding structured graph data for large language models. It introduces GraphToken, a parameter-efficient encoder that produces a small set of graph tokens via a GNN and projects them into the LLM embedding space, with the LLM kept frozen during training. Through extensive experiments on the GraphQA benchmark, GraphToken substantially improves performance across graph-level, node-level, and edge-level tasks, and analyses show encoder choice and feature design critically influence results. The work demonstrates that explicit graph representations in the prompt space can significantly enhance reasoning with LLMs, offering a practical approach to leveraging structured data without large-scale fine-tuning.
Abstract
How can we best encode structured data into sequential form for use in large language models (LLMs)? In this work, we introduce a parameter-efficient method to explicitly represent structured data for LLMs. Our method, GraphToken, learns an encoding function to extend prompts with explicit structured information. Unlike other work which focuses on limited domains (e.g. knowledge graph representation), our work is the first effort focused on the general encoding of structured data to be used for various reasoning tasks. We show that explicitly representing the graph structure allows significant improvements to graph reasoning tasks. Specifically, we see across the board improvements - up to 73% points - on node, edge and, graph-level tasks from the GraphQA benchmark.
