Table of Contents
Fetching ...

GCoder: Improving Large Language Model for Generalized Graph Problem Solving

Qifan Zhang, Xiaobin Hong, Jianheng Tang, Nuo Chen, Yuhan Li, Wenzhong Li, Jing Tang, Jia Li

TL;DR

GCoder, a code-based LLM designed to enhance problem-solving in generalized graph computation problems, and outperforms GPT-4o, with an average accuracy improvement of 16.42% across various graph computational problems.

Abstract

Large Language Models (LLMs) have demonstrated strong reasoning abilities, making them suitable for complex tasks such as graph computation. Traditional reasoning steps paradigm for graph problems is hindered by unverifiable steps, limited long-term reasoning, and poor generalization to graph variations. To overcome these limitations, we introduce GCoder, a code-based LLM designed to enhance problem-solving in generalized graph computation problems. Our method involves constructing an extensive training dataset, GraphWild, featuring diverse graph formats and algorithms. We employ a multi-stage training process, including Supervised Fine-Tuning (SFT) and Reinforcement Learning from Compiler Feedback (RLCF), to refine model capabilities. For unseen tasks, a hybrid retrieval technique is used to augment performance. Experiments demonstrate that GCoder outperforms GPT-4o, with an average accuracy improvement of 16.42% across various graph computational problems. Furthermore, GCoder efficiently manages large-scale graphs with millions of nodes and diverse input formats, overcoming the limitations of previous models focused on the reasoning steps paradigm. This advancement paves the way for more intuitive and effective graph problem-solving using LLMs. Code and data are available at here: https://github.com/Bklight999/WWW25-GCoder/tree/master.

GCoder: Improving Large Language Model for Generalized Graph Problem Solving

TL;DR

GCoder, a code-based LLM designed to enhance problem-solving in generalized graph computation problems, and outperforms GPT-4o, with an average accuracy improvement of 16.42% across various graph computational problems.

Abstract

Large Language Models (LLMs) have demonstrated strong reasoning abilities, making them suitable for complex tasks such as graph computation. Traditional reasoning steps paradigm for graph problems is hindered by unverifiable steps, limited long-term reasoning, and poor generalization to graph variations. To overcome these limitations, we introduce GCoder, a code-based LLM designed to enhance problem-solving in generalized graph computation problems. Our method involves constructing an extensive training dataset, GraphWild, featuring diverse graph formats and algorithms. We employ a multi-stage training process, including Supervised Fine-Tuning (SFT) and Reinforcement Learning from Compiler Feedback (RLCF), to refine model capabilities. For unseen tasks, a hybrid retrieval technique is used to augment performance. Experiments demonstrate that GCoder outperforms GPT-4o, with an average accuracy improvement of 16.42% across various graph computational problems. Furthermore, GCoder efficiently manages large-scale graphs with millions of nodes and diverse input formats, overcoming the limitations of previous models focused on the reasoning steps paradigm. This advancement paves the way for more intuitive and effective graph problem-solving using LLMs. Code and data are available at here: https://github.com/Bklight999/WWW25-GCoder/tree/master.

Paper Structure

This paper contains 19 sections, 5 equations, 19 figures, 1 table.

Figures (19)

  • Figure 1: (a) While the reasoning step paradigm outputs correct results, intermediate reasoning can be wrong (i.e., red reasoning step, node 2 is not connected to 0 and 5). (b) Our code paradigm processes graph problems with programming. More examples can be found in Appendix \ref{['sec:QA_case']}.
  • Figure 2: The workflow of our proposed GCoder.
  • Figure 3: The overview framework of GCoder, which consists of (a) Model Fine-tuning and (b) Model Inference two pipelines. In model fine-tuning, we develop SFT and RLCF two fine-tuning stages with our constructed GraphWild dataset. In model inference, the in-domain query task is directly prompt our tuned model for code generation, while the out-of-domain task is enhanced by the RAG technique. We execute the generated code and evaluate the code output with a ground-truth answer.
  • Figure 4: RAG boosts the performance of out-of-domain tasks. Where 0-shot represents direct inference, 1-doc and 2-doc denote the chunk numbers of RAG. We perform RAG inferences with 2 fine-tuned models (i.e., GCoder-L bases on Llama3.1-8b and GCoder-Q bases on Qwen2.5-coder), and results show the effectiveness of RAG.
  • Figure 5: Performance with different graph sizes on Bipartite and Shortest tasks. GCoder clearly outperforms Qwen2.5-coder and Graphwiz baselines and achieves stable performance with graph size increasing.
  • ...and 14 more figures