Investigating Instruction Tuning Large Language Models on Graphs

Kerui Zhu; Bo-Wei Huang; Bowen Jin; Yizhu Jiao; Ming Zhong; Kevin Chang; Shou-De Lin; Jiawei Han

Investigating Instruction Tuning Large Language Models on Graphs

Kerui Zhu, Bo-Wei Huang, Bowen Jin, Yizhu Jiao, Ming Zhong, Kevin Chang, Shou-De Lin, Jiawei Han

TL;DR

This work addresses how instruction-tuned LLMs can learn and generalize on graph-structured data. It builds a fine-grained benchmark across two domains (Amazon Metadata and MAPLE) with 79 sub-tasks over 14 tasks, and compares natural language, JSON, and DOT representations, finding JSON to be the most effective bridge for graph understanding. By evaluating unseen sub-tasks, domains, and answer types, the study reveals strengths and limits of graph instruction tuning, such as robust generalization for simple graph algorithms but vulnerabilities for counting and inductive reasoning tasks. The results demonstrate that graph-tuned LLMs can derive and apply graph algorithms beyond training, with practical implications for creating versatile, instruction-following graph solvers using JSON-based representations and parameter-efficient fine-tuning.

Abstract

Inspired by the recent advancements of Large Language Models (LLMs) in NLP tasks, there's growing interest in applying LLMs to graph-related tasks. This study delves into the capabilities of instruction-following LLMs for engaging with real-world graphs, aiming to offer empirical insights into how LLMs can effectively interact with graphs and generalize across graph tasks. We begin by constructing a dataset designed for instruction tuning, which comprises a diverse collection of 79 graph-related tasks from academic and e-commerce domains, featuring 44,240 training instances and 18,960 test samples. Utilizing this benchmark, our initial investigation focuses on identifying the optimal graph representation that serves as a conduit for LLMs to understand complex graph structures. Our findings indicate that JSON format for graph representation consistently outperforms natural language and code formats across various LLMs and graph types. Furthermore, we examine the key factors that influence the generalization abilities of instruction-tuned LLMs by evaluating their performance on both in-domain and out-of-domain graph tasks.

Investigating Instruction Tuning Large Language Models on Graphs

TL;DR

Abstract

Paper Structure (32 sections, 1 equation, 5 figures, 5 tables)

This paper contains 32 sections, 1 equation, 5 figures, 5 tables.

Introduction
Related Work
LLMs on Graphs
Instruction Tuning for LLMs
Instruction Tuning on Graph
Preliminaries
Task Definition
Evaluation Splits
Data Collection
Graph Sampling.
Node De-identification.
Question-Graph Collection.
Graph Representation
Graph Instruction Tuning
Experiments
...and 17 more sections

Figures (5)

Figure 1: Examples of graph representations and three levels of generalization.
Figure 2: Compare LLMs of different scales using three graph representations.
Figure 3: Experiment results of sub-task generalization on two datasets.
Figure 4: Compare LLMs of different scales on domain generalization.
Figure 5: Case study on finding the shortest path between two non-product nodes in the Amazon dataset, depicted through a graph in JSON format.

Investigating Instruction Tuning Large Language Models on Graphs

TL;DR

Abstract

Investigating Instruction Tuning Large Language Models on Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (5)