A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei, Haowei Liu, Xuyang Wu, Yi Fang
TL;DR
This survey analyzes how feedback-based multi-step reasoning enhances mathematical problem solving in large language models, distinguishing training-based (step-level PRMs and ORM-based supervision) from training-free approaches (self-evaluation, logits, and external-tool verification) and detailing their roles in aggregation, search, and refinement. It presents a taxonomy of methods that rely on step-level versus outcome-level guidance, including hybrid strategies, and surveys datasets and evaluation challenges relevant to math reasoning. Key contributions include a comprehensive mapping of reward-modeling techniques (PRMs/ORMs), search strategies (MCTS, beam search), and prompting/refinement methods, along with a critical discussion of reward hacking and distribution shift. The work aims to establish foundational benchmarks and guidance to advance robust, efficient, and scalable math reasoning with LLMs across diverse problem domains and languages.
Abstract
Recent progress in large language models (LLM) found chain-of-thought prompting strategies to improve the reasoning ability of LLMs by encouraging problem solving through multiple steps. Therefore, subsequent research aimed to integrate the multi-step reasoning process into the LLM itself through process rewards as feedback and achieved improvements over prompting strategies. Due to the cost of step-level annotation, some turn to outcome rewards as feedback. Aside from these training-based approaches, training-free techniques leverage frozen LLMs or external tools for feedback at each step to enhance the reasoning process. With the abundance of work in mathematics due to its logical nature, we present a survey of strategies utilizing feedback at the step and outcome levels to enhance multi-step math reasoning for LLMs. As multi-step reasoning emerges a crucial component in scaling LLMs, we hope to establish its foundation for easier understanding and empower further research.
