Table of Contents
Fetching ...

Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning

Wenjie Wu, Yongcheng Jing, Yingjie Wang, Wenbin Hu, Dacheng Tao

TL;DR

The paper tackles the challenge that small or resource-constrained LLMs struggle with deep, domain-specific mathematics and are prone to hallucinations during multi-step reasoning. It proposes KG-RAR, a training-free framework that integrates stepwise knowledge-graph retrieval with iterative reasoning, supported by a Post-Retrieval Processing and Reward Model (PRP-RM) to refine context and score steps. A process-oriented Math Knowledge Graph (MKG) is constructed to encode procedural reasoning and dependencies, and a hierarchical retrieval strategy enables targeted, context-rich subgraphs to guide each reasoning step. Experimental results on Math500 and GSM8K across six models show meaningful improvements over standard CoT prompts and competitive performance relative to fine-tuned reward models, demonstrating the practicality of external, structured knowledge augmentation for o1-like reasoning in frozen LLMs.

Abstract

Recent large language model (LLM) reasoning, despite its success, suffers from limited domain knowledge, susceptibility to hallucinations, and constrained reasoning depth, particularly in small-scale models deployed in resource-constrained environments. This paper presents the first investigation into integrating step-wise knowledge graph retrieval with step-wise reasoning to address these challenges, introducing a novel paradigm termed as graph-augmented reasoning. Our goal is to enable frozen, small-scale LLMs to retrieve and process relevant mathematical knowledge in a step-wise manner, enhancing their problem-solving abilities without additional training. To this end, we propose KG-RAR, a framework centered on process-oriented knowledge graph construction, a hierarchical retrieval strategy, and a universal post-retrieval processing and reward model (PRP-RM) that refines retrieved information and evaluates each reasoning step. Experiments on the Math500 and GSM8K benchmarks across six models demonstrate that KG-RAR yields encouraging results, achieving a 20.73\% relative improvement with Llama-3B on Math500.

Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning

TL;DR

The paper tackles the challenge that small or resource-constrained LLMs struggle with deep, domain-specific mathematics and are prone to hallucinations during multi-step reasoning. It proposes KG-RAR, a training-free framework that integrates stepwise knowledge-graph retrieval with iterative reasoning, supported by a Post-Retrieval Processing and Reward Model (PRP-RM) to refine context and score steps. A process-oriented Math Knowledge Graph (MKG) is constructed to encode procedural reasoning and dependencies, and a hierarchical retrieval strategy enables targeted, context-rich subgraphs to guide each reasoning step. Experimental results on Math500 and GSM8K across six models show meaningful improvements over standard CoT prompts and competitive performance relative to fine-tuned reward models, demonstrating the practicality of external, structured knowledge augmentation for o1-like reasoning in frozen LLMs.

Abstract

Recent large language model (LLM) reasoning, despite its success, suffers from limited domain knowledge, susceptibility to hallucinations, and constrained reasoning depth, particularly in small-scale models deployed in resource-constrained environments. This paper presents the first investigation into integrating step-wise knowledge graph retrieval with step-wise reasoning to address these challenges, introducing a novel paradigm termed as graph-augmented reasoning. Our goal is to enable frozen, small-scale LLMs to retrieve and process relevant mathematical knowledge in a step-wise manner, enhancing their problem-solving abilities without additional training. To this end, we propose KG-RAR, a framework centered on process-oriented knowledge graph construction, a hierarchical retrieval strategy, and a universal post-retrieval processing and reward model (PRP-RM) that refines retrieved information and evaluates each reasoning step. Experiments on the Math500 and GSM8K benchmarks across six models demonstrate that KG-RAR yields encouraging results, achieving a 20.73\% relative improvement with Llama-3B on Math500.

Paper Structure

This paper contains 18 sections, 12 equations, 13 figures, 2 tables, 2 algorithms.

Figures (13)

  • Figure 1: Illustration of the proposed step-by-step knowledge graph retrieval for o1-like reasoning, which dynamically retrieves and utilises structured sub-graphs (Sub-KGs) during reasoning. Our approach iteratively refines the reasoning process by retrieving relevant Sub-KGs at each step, enhancing accuracy, consistency, and reasoning depth for complex tasks, thereby offering a novel form of scaling test-time computation.
  • Figure 2: Example of Step-by-Step KG-RAR's iterative process: 1) Retrieving: For a given question or intermediate reasoning step, the KG is retrieved to find the most similar problem or procedure (underlined in the figure) and extract its subgraph as the raw retrieval. 2) Refining: A frozen LLM processes the raw retrieval to generate a refined and targeted context for reasoning. 3) Reasoning: Using the refined retrieval, another LLM reflects on previous steps and generates next intermediate reasoning steps. This iterative workflow refines and guides the reasoning path to problem-solving.
  • Figure 3: Pipeline for constructing the process-oriented math knowledge graph from process supervision datasets.
  • Figure 4: Illustration of the Post-Retrieval Processing and Reward Model (PRP-RM). Given a problem $P$ and its retrieved context $\mathcal{R}_p$ from the Knowledge Graph (KG), PRP-RM refines it into $\mathcal{R'}_p$. The Reasoner LLM generates step $S_1$ based on $\mathcal{R'}_p$, followed by iterative retrieval and refinement ($\mathcal{R}_t \to \mathcal{R'}_t$) for each step $S_t$. Correctness is assessed using $I =$ "Is this step correct?" to compute $\operatorname{Score}(S_t)$, while completion is checked via $I_E =$ "Has a final answer been reached?" to compute $\operatorname{End}(S_t)$. The process continues until $\operatorname{End}(S_t)$ surpasses a threshold or a predefined inference depth is reached.
  • Figure 5: Comparison of reward models under Last@8.
  • ...and 8 more figures