Table of Contents
Fetching ...

GraphSkill: Documentation-Guided Hierarchical Retrieval-Augmented Coding for Complex Graph Reasoning

Fali Wang, Chenglin Weng, Xianren Zhang, Siyuan Hong, Hui Liu, Suhang Wang

TL;DR

An agentic hierarchical retrieval-augmented coding framework that exploits the document hierarchy through top-down traversal and early pruning, together with a self-debugging coding agent that iteratively refines code using automatically generated small-scale test cases is proposed.

Abstract

The growing demand for automated graph algorithm reasoning has attracted increasing attention in the large language model (LLM) community. Recent LLM-based graph reasoning methods typically decouple task descriptions from graph data, generate executable code augmented by retrieval from technical documentation, and refine the code through debugging. However, we identify two key limitations in existing approaches: (i) they treat technical documentation as flat text collections and ignore its hierarchical structure, leading to noisy retrieval that degrades code generation quality; and (ii) their debugging mechanisms focus primarily on runtime errors, yet ignore more critical logical errors. To address them, we propose {\method}, an \textit{agentic hierarchical retrieval-augmented coding framework} that exploits the document hierarchy through top-down traversal and early pruning, together with a \textit{self-debugging coding agent} that iteratively refines code using automatically generated small-scale test cases. To enable comprehensive evaluation of complex graph reasoning, we introduce a new dataset, {\dataset}, covering small-scale, large-scale, and composite graph reasoning tasks. Extensive experiments demonstrate that our method achieves higher task accuracy and lower inference cost compared to baselines\footnote{The code is available at \href{https://github.com/FairyFali/GraphSkill}{\textcolor{blue}{https://github.com/FairyFali/GraphSkill}}.}.

GraphSkill: Documentation-Guided Hierarchical Retrieval-Augmented Coding for Complex Graph Reasoning

TL;DR

An agentic hierarchical retrieval-augmented coding framework that exploits the document hierarchy through top-down traversal and early pruning, together with a self-debugging coding agent that iteratively refines code using automatically generated small-scale test cases is proposed.

Abstract

The growing demand for automated graph algorithm reasoning has attracted increasing attention in the large language model (LLM) community. Recent LLM-based graph reasoning methods typically decouple task descriptions from graph data, generate executable code augmented by retrieval from technical documentation, and refine the code through debugging. However, we identify two key limitations in existing approaches: (i) they treat technical documentation as flat text collections and ignore its hierarchical structure, leading to noisy retrieval that degrades code generation quality; and (ii) their debugging mechanisms focus primarily on runtime errors, yet ignore more critical logical errors. To address them, we propose {\method}, an \textit{agentic hierarchical retrieval-augmented coding framework} that exploits the document hierarchy through top-down traversal and early pruning, together with a \textit{self-debugging coding agent} that iteratively refines code using automatically generated small-scale test cases. To enable comprehensive evaluation of complex graph reasoning, we introduce a new dataset, {\dataset}, covering small-scale, large-scale, and composite graph reasoning tasks. Extensive experiments demonstrate that our method achieves higher task accuracy and lower inference cost compared to baselines\footnote{The code is available at \href{https://github.com/FairyFali/GraphSkill}{\textcolor{blue}{https://github.com/FairyFali/GraphSkill}}.}.
Paper Structure (38 sections, 15 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 38 sections, 15 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) Error distribution across various LLMs. (b) Text-based reasoning accuracy (%) across graph scales on ComplexGraph using different LLMs.
  • Figure 2: Overview of the Agentic Hierarchical Retrieval-Augmented Self-debugging Coding Framework.
  • Figure 3: The hierarchical retrieval agent workflow.
  • Figure 4: Retrieval performance and time on small-scale subset.
  • Figure 5: Sensitivity analysis of the hyper-parameter w.r.t. self-debugging budget $T_{\max}$ on ComplexGraph-L.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 1: Retrieval-augmented Graph Reasoning by Coding