GraphWiz: An Instruction-Following Language Model for Graph Problems
Nuo Chen, Yuhan Li, Jianheng Tang, Jia Li
TL;DR
This work tackles the challenge of enabling open-source LLMs to solve diverse graph problems with explicit, auditable reasoning. It introduces GraphInstruct, a large instruction-tuning dataset spanning nine graph tasks, and GraphWiz, an open-source model trained in two stages (mixed-task SFT and DPO alignment) to generate step-by-step reasoning alongside answers. The results show GraphWiz, particularly with DPO, outperforming GPT-4 on average across tasks, while revealing insights about training data volume, transferability, and graph-size scaling. The work presents a blueprint for graph-focused reasoning in LLMs and demonstrates strong potential for cross-task generalization and interpretability in graph algorithms.
Abstract
Large language models (LLMs) have achieved impressive success across several fields, but their proficiency in understanding and resolving complex graph problems is less explored. To bridge this gap, we introduce GraphInstruct, a novel and comprehensive instruction-tuning dataset designed to equip language models with the ability to tackle a broad spectrum of graph problems using explicit reasoning paths. Utilizing GraphInstruct, we build GraphWiz, an open-source language model capable of resolving various graph problem types while generating clear reasoning processes. To enhance the model's capability and reliability, we incorporate the Direct Preference Optimization (DPO) framework into the graph problem-solving context. The enhanced model, GraphWiz-DPO, achieves an average accuracy of 65% across nine tasks with different complexity levels, surpassing GPT-4 which has an average accuracy of 43.8%. Moreover, our research delves into the delicate balance between training data volume and model performance, highlighting the potential for overfitting with increased data. We also explore the transferability of the model's reasoning ability across different graph tasks, indicating the model's adaptability and practical application potential. Our investigation offers a new blueprint and valuable insights for developing LLMs specialized in graph reasoning and problem-solving.
