Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning
Wenjie Wu, Yongcheng Jing, Yingjie Wang, Wenbin Hu, Dacheng Tao
TL;DR
The paper tackles the challenge that small or resource-constrained LLMs struggle with deep, domain-specific mathematics and are prone to hallucinations during multi-step reasoning. It proposes KG-RAR, a training-free framework that integrates stepwise knowledge-graph retrieval with iterative reasoning, supported by a Post-Retrieval Processing and Reward Model (PRP-RM) to refine context and score steps. A process-oriented Math Knowledge Graph (MKG) is constructed to encode procedural reasoning and dependencies, and a hierarchical retrieval strategy enables targeted, context-rich subgraphs to guide each reasoning step. Experimental results on Math500 and GSM8K across six models show meaningful improvements over standard CoT prompts and competitive performance relative to fine-tuned reward models, demonstrating the practicality of external, structured knowledge augmentation for o1-like reasoning in frozen LLMs.
Abstract
Recent large language model (LLM) reasoning, despite its success, suffers from limited domain knowledge, susceptibility to hallucinations, and constrained reasoning depth, particularly in small-scale models deployed in resource-constrained environments. This paper presents the first investigation into integrating step-wise knowledge graph retrieval with step-wise reasoning to address these challenges, introducing a novel paradigm termed as graph-augmented reasoning. Our goal is to enable frozen, small-scale LLMs to retrieve and process relevant mathematical knowledge in a step-wise manner, enhancing their problem-solving abilities without additional training. To this end, we propose KG-RAR, a framework centered on process-oriented knowledge graph construction, a hierarchical retrieval strategy, and a universal post-retrieval processing and reward model (PRP-RM) that refines retrieved information and evaluates each reasoning step. Experiments on the Math500 and GSM8K benchmarks across six models demonstrate that KG-RAR yields encouraging results, achieving a 20.73\% relative improvement with Llama-3B on Math500.
