SMRC: Aligning Large Language Models with Student Reasoning for Mathematical Error Correction
Biaojie Zeng, Min Zhang, Juan Zhou, Fengrui Liu, Ruiyang Huang, Xin Lin
TL;DR
This work tackles the challenge of aligning large language models with student problem-solving processes for mathematical error correction. It introduces SAMRC and a practical SMRC framework that uses Monte Carlo Tree Search guided by a process-supervised reward model, augmented by BFS-derived reward signals, to preserve correct student reasoning while achieving correct solutions. A new real-world dataset, MSEB, captures authentic student solutions and multiple valid strategies, enabling evaluation of both solution accuracy and step retention via the CSRR metric and harmonic mean. Empirical results across ProcessBench, MR-GSM8K, and MSEB show SMRC outperforms baselines and demonstrates strong generality across base models and search strategies, advancing educational applicability of AI-assisted tutoring.
Abstract
Large language models (LLMs) often make reasoning errors when solving mathematical problems, and how to automatically detect and correct these errors has become an important research direction. However, existing approaches \textit{mainly focus on self-correction within the model}, which falls short of the ``teacher-style`` correction required in educational settings, \textit{i.e.}, systematically guiding and revising a student's problem-solving process. To address this gap, we propose \texttt{SMRC} (\textit{\underline{S}tudent \underline{M}athematical \underline{R}easoning \underline{C}orrection}), a novel method that aligns LLMs with student reasoning. Specifically, \texttt{SMRC} formulates student reasoning as a multi-step sequential decision problem and introduces Monte Carlo Tree Search (MCTS) to explore optimal correction paths. To reduce the cost of the annotating process-level rewards, we leverage breadth-first search (BFS) guided by LLMs and final-answer evaluation to generate reward signals, which are then distributed across intermediate reasoning steps via a back-propagation mechanism, enabling fine-grained process supervision. Additionally, we construct a benchmark for high school mathematics, MSEB (Multi-Solution Error Benchmark), consisting of 158 instances that include problem statements, student solutions, and correct reasoning steps. We further propose a dual evaluation protocol centered on \textbf{solution accuracy} and \textbf{correct-step retention}, offering a comprehensive measure of educational applicability. Experiments demonstrate that \texttt{SMRC} significantly outperforms existing methods on two public datasets (ProcessBench and MR-GSM8K) and our MSEB in terms of effectiveness and overall performance. The code and data are available at https://github.com/Mind-Lab-ECNU/SMRC.
