Table of Contents
Fetching ...

GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability

Zihan Luo, Xiran Song, Hong Huang, Jianxun Lian, Chenhao Zhang, Jinqi Jiang, Xing Xie, Hai Jin

TL;DR

GraphInstruct presents a dynamic, instruction-tuning-oriented benchmark with 21 classical graph reasoning tasks and explicit intermediate steps to advance LLMs’ graph understanding. It introduces GraphSolver, an instruction-tuned model, and GraphSolver+ with a label-mask strategy to enhance multi-step graph reasoning, validated through extensive experiments showing superiority over several open LLMs and competitiveness with GPT-3.5 Turbo. The work demonstrates robust graph-understanding improvements and notable gains in reasoning tasks, while identifying challenges on complex, large-scale, and out-of-domain problems. By releasing code and data, GraphInstruct aims to catalyze broad progress in applying LLMs to graph-structured data and related reasoning tasks.

Abstract

Improving the general capabilities of large language models (LLMs) is an active research topic. As a common data structure in many real-world domains, understanding graph data is a crucial part of advancing general intelligence. To this end, we propose a dynamic benchmark named GraphInstruct in this paper, which comprehensively includes 21 classical graph reasoning tasks, providing diverse graph generation pipelines and detailed intermediate reasoning steps for each sample. Based on GraphInstruct, we develop GraphSolver via efficient instruction-tuning, which demonstrates prominent graph understanding capability compared to other open-sourced LLMs. To further endow LLMs with multi-step graph reasoning capability, we propose a label-mask training strategy and build GraphSolver+, which leverages masked supervision on intermediate reasoning tokens to emphasize crucial node-identification signals. As one of the pioneering efforts to enhance the graph understanding and reasoning abilities of LLMs, extensive experiments have demonstrated the superiority of GraphSolver and GraphSolver+ over other LLMs. We sincerely hope GraphInstruct will facilitate further research on applying LLMs to graph-structured data. Our code and data are released publicly at: https://github.com/CGCL-codes/GraphInstruct.

GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability

TL;DR

GraphInstruct presents a dynamic, instruction-tuning-oriented benchmark with 21 classical graph reasoning tasks and explicit intermediate steps to advance LLMs’ graph understanding. It introduces GraphSolver, an instruction-tuned model, and GraphSolver+ with a label-mask strategy to enhance multi-step graph reasoning, validated through extensive experiments showing superiority over several open LLMs and competitiveness with GPT-3.5 Turbo. The work demonstrates robust graph-understanding improvements and notable gains in reasoning tasks, while identifying challenges on complex, large-scale, and out-of-domain problems. By releasing code and data, GraphInstruct aims to catalyze broad progress in applying LLMs to graph-structured data and related reasoning tasks.

Abstract

Improving the general capabilities of large language models (LLMs) is an active research topic. As a common data structure in many real-world domains, understanding graph data is a crucial part of advancing general intelligence. To this end, we propose a dynamic benchmark named GraphInstruct in this paper, which comprehensively includes 21 classical graph reasoning tasks, providing diverse graph generation pipelines and detailed intermediate reasoning steps for each sample. Based on GraphInstruct, we develop GraphSolver via efficient instruction-tuning, which demonstrates prominent graph understanding capability compared to other open-sourced LLMs. To further endow LLMs with multi-step graph reasoning capability, we propose a label-mask training strategy and build GraphSolver+, which leverages masked supervision on intermediate reasoning tokens to emphasize crucial node-identification signals. As one of the pioneering efforts to enhance the graph understanding and reasoning abilities of LLMs, extensive experiments have demonstrated the superiority of GraphSolver and GraphSolver+ over other LLMs. We sincerely hope GraphInstruct will facilitate further research on applying LLMs to graph-structured data. Our code and data are released publicly at: https://github.com/CGCL-codes/GraphInstruct.
Paper Structure (32 sections, 4 equations, 10 figures, 6 tables)

This paper contains 32 sections, 4 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Case study on graph reasoning task that general LLMs like GPT-3.5 Turbo fail to handle
  • Figure 2: The overview of GraphInstruct benchmark. We provide diverse options during the construction of GraphInstruct, including graph structure distributions, graph sizes, graph description languages, and node IDs. For improving the reasoning capability of LLMs, GraphInstruct also provides precise intermediate results for each task.
  • Figure 3: The complete task schema of GraphInstruct
  • Figure 4: Accuracy comparison of several LLMs on 17 in-domain tasks of GraphInstruct
  • Figure 5: Performance comparisons on both GraphInstruct-specific tasks and shared tasks with two competitor models
  • ...and 5 more figures