Table of Contents
Fetching ...

GraphTool-Instruction: Revolutionizing Graph Reasoning in LLMs through Decomposed Subtask Instruction

Rongzheng Wang, Shuang Liang, Qizhi Chen, Jiasheng Zhang, Ke Qin

TL;DR

This paper introduces GraphTool-Instruction, a decomposed instruction-tuning approach for graph reasoning in LLMs that splits tasks into graph extraction, tool-name identification, and tool-parameter extraction. It constructs GTools, a 40,000-instance dataset across 20 graph-reasoning tasks, and fine-tunes an open-source LLM, GraphForge (based on Llama3-8B), via LoRA on this dataset. The authors demonstrate state-of-the-art performance against Text-Instruction and Tool-Instruction baselines on both WL-Graph and EL-Graph, achieving an average accuracy of around 98% and approaching GPT-4o-FC in many cases. The work also provides a detailed error analysis and introduces three evaluation metrics (Graph, Tool Name, Tool Parameter) to assess tool execution reliability, underscoring the method’s robustness and potential for real-world graph reasoning tasks.

Abstract

Large language models (LLMs) have been demonstrated to possess the capabilities to understand fundamental graph properties and address various graph reasoning tasks. Existing methods fine-tune LLMs to understand and execute graph reasoning tasks by specially designed task instructions. However, these Text-Instruction methods generally exhibit poor performance. Inspired by tool learning, researchers propose Tool-Instruction methods to solve various graph problems by special tool calling (e.g., function, API and model), achieving significant improvements in graph reasoning tasks. Nevertheless, current Tool-Instruction approaches focus on the tool information and ignore the graph structure information, which leads to significantly inferior performance on small-scale LLMs (less than 13B). To tackle this issue, we propose GraphTool-Instruction, an innovative Instruction-tuning approach that decomposes the graph reasoning task into three distinct subtasks (i.e., graph extraction, tool name identification and tool parameter extraction), and design specialized instructions for each subtask. Our GraphTool-Instruction can be used as a plug-and-play prompt for different LLMs without fine-tuning. Moreover, building on GraphTool-Instruction, we develop GTools, a dataset that includes twenty graph reasoning tasks, and create a graph reasoning LLM called GraphForge based on Llama3-8B. We conduct extensive experiments on twenty graph reasoning tasks with different graph types (e.g., graph size or graph direction), and we find that GraphTool-Instruction achieves SOTA compared to Text-Instruction and Tool-Instruction methods. Fine-tuned on GTools, GraphForge gets further improvement of over 30% compared to the Tool-Instruction enhanced GPT-3.5-turbo, and it performs comparably to the high-cost GPT-4o. Our codes and data are available at https://anonymous.4open.science/r/GraphTool-Instruction.

GraphTool-Instruction: Revolutionizing Graph Reasoning in LLMs through Decomposed Subtask Instruction

TL;DR

This paper introduces GraphTool-Instruction, a decomposed instruction-tuning approach for graph reasoning in LLMs that splits tasks into graph extraction, tool-name identification, and tool-parameter extraction. It constructs GTools, a 40,000-instance dataset across 20 graph-reasoning tasks, and fine-tunes an open-source LLM, GraphForge (based on Llama3-8B), via LoRA on this dataset. The authors demonstrate state-of-the-art performance against Text-Instruction and Tool-Instruction baselines on both WL-Graph and EL-Graph, achieving an average accuracy of around 98% and approaching GPT-4o-FC in many cases. The work also provides a detailed error analysis and introduces three evaluation metrics (Graph, Tool Name, Tool Parameter) to assess tool execution reliability, underscoring the method’s robustness and potential for real-world graph reasoning tasks.

Abstract

Large language models (LLMs) have been demonstrated to possess the capabilities to understand fundamental graph properties and address various graph reasoning tasks. Existing methods fine-tune LLMs to understand and execute graph reasoning tasks by specially designed task instructions. However, these Text-Instruction methods generally exhibit poor performance. Inspired by tool learning, researchers propose Tool-Instruction methods to solve various graph problems by special tool calling (e.g., function, API and model), achieving significant improvements in graph reasoning tasks. Nevertheless, current Tool-Instruction approaches focus on the tool information and ignore the graph structure information, which leads to significantly inferior performance on small-scale LLMs (less than 13B). To tackle this issue, we propose GraphTool-Instruction, an innovative Instruction-tuning approach that decomposes the graph reasoning task into three distinct subtasks (i.e., graph extraction, tool name identification and tool parameter extraction), and design specialized instructions for each subtask. Our GraphTool-Instruction can be used as a plug-and-play prompt for different LLMs without fine-tuning. Moreover, building on GraphTool-Instruction, we develop GTools, a dataset that includes twenty graph reasoning tasks, and create a graph reasoning LLM called GraphForge based on Llama3-8B. We conduct extensive experiments on twenty graph reasoning tasks with different graph types (e.g., graph size or graph direction), and we find that GraphTool-Instruction achieves SOTA compared to Text-Instruction and Tool-Instruction methods. Fine-tuned on GTools, GraphForge gets further improvement of over 30% compared to the Tool-Instruction enhanced GPT-3.5-turbo, and it performs comparably to the high-cost GPT-4o. Our codes and data are available at https://anonymous.4open.science/r/GraphTool-Instruction.

Paper Structure

This paper contains 22 sections, 17 equations, 15 figures, 11 tables.

Figures (15)

  • Figure 1: (a) Text-Instruction method represented by GraphWiz; (b) Tool-Instruction method represented by Graph-ToolFormer; (c) GraphTool Instruction-tuning method.
  • Figure 2: The overview of LLM solves graph reasoning tasks based on GraphTool-Instruction. Basic Graph Analysis Task (BGA-Task) does not require additional tool parameters, whereas Parametric Graph Query Task (PGQ-Task) requires specific input tool parameters for reasoning. WL-Graph denotes a task length within 4096 tokens, while EL-Graph is the opposite. The red arrow shows the BGA-Task reasoning process, and the blue arrow shows the PGQ-Task process, which additionally introduces Parameter-Instruction to enhance the accuracy of parameter extraction.
  • Figure 3: Graph reasoning based on GraphTool-Instruction.
  • Figure 4: Impact of graph, name and parameter accuracies on overall answer accuracy. Notably, both Cycle Detection and Maximum Triangle Sum are BGA-Task, so there is no result for Parameter Accuracy.
  • Figure 5: Error Analysis on GPT-3.5-turbo-FC, GLM-0520-FC, Graph-ToolFormer and GraphForge. Mis is short for Mismacth.
  • ...and 10 more figures