Guideline Forest: Retrieval-Augmented Reasoning with Branching Experience-Induced Guidelines
Jiaxiang Chen, Zhuo Wang, Mingxi Zou, Qifan Wang, Zenglin Xu
TL;DR
Guideline Forest introduces a memory-augmented, retrieval-guided reasoning framework that stores verified gold reasoning traces as reusable experience and induces structured guidelines to steer multi-step problem solving. By retrieving relevant reasoning trajectories and executing multiple guideline-driven branches with stepwise aggregation (and optional multi-model collaboration), it achieves robust, scalable reasoning across math and code benchmarks, outperforming strong baselines such as CoT, ReAct, ToT, FoT, and AFlow. Ablation studies confirm the importance of selective retrieval, path diversity, and early-step aggregation, while demonstrations show the approach generalizes to enhance diverse reasoning methods and enables cross-model collaboration. The work suggests a practical path toward more transparent, cooperative, and efficient reasoning in large language models, with potential impact on complex real-world problem solving.
Abstract
Retrieval-augmented generation (RAG) has been widely adopted to ground large language models (LLMs) in external knowledge, yet it remains largely underexplored for improving reasoning. Existing methods either rely on online exploration during inference or heuristic supervision over reasoning trajectories, but they fail to effectively accumulate and reuse past reasoning experience. We propose Guideline Forest, a retrieval-augmented reasoning framework that explicitly leverages experience to guide multi-step reasoning. The framework stores high-quality, label-consistent reasoning traces as reusable memory, retrieves relevant experiences for new problems, and induces them into structured guidelines that steer reasoning and enable controlled branching and aggregation. Experiments on mathematical (GSM8K, MATH-500) and programming (MBPP, HumanEval) benchmarks demonstrate consistent improvements over strong reasoning baselines, including CoT, ReAct, ToT, FoT, and AFlow. Further analyses show that experience retrieval, guideline-induced diversity, and stepwise aggregation are key to the framework's effectiveness. Beyond single-model reasoning, Guideline Forest generalizes to enhance diverse reasoning paradigms and supports multi-model collaboration, highlighting its flexibility and scalability.
