Table of Contents
Fetching ...

SMRC: Aligning Large Language Models with Student Reasoning for Mathematical Error Correction

Biaojie Zeng, Min Zhang, Juan Zhou, Fengrui Liu, Ruiyang Huang, Xin Lin

TL;DR

This work tackles the challenge of aligning large language models with student problem-solving processes for mathematical error correction. It introduces SAMRC and a practical SMRC framework that uses Monte Carlo Tree Search guided by a process-supervised reward model, augmented by BFS-derived reward signals, to preserve correct student reasoning while achieving correct solutions. A new real-world dataset, MSEB, captures authentic student solutions and multiple valid strategies, enabling evaluation of both solution accuracy and step retention via the CSRR metric and harmonic mean. Empirical results across ProcessBench, MR-GSM8K, and MSEB show SMRC outperforms baselines and demonstrates strong generality across base models and search strategies, advancing educational applicability of AI-assisted tutoring.

Abstract

Large language models (LLMs) often make reasoning errors when solving mathematical problems, and how to automatically detect and correct these errors has become an important research direction. However, existing approaches \textit{mainly focus on self-correction within the model}, which falls short of the ``teacher-style`` correction required in educational settings, \textit{i.e.}, systematically guiding and revising a student's problem-solving process. To address this gap, we propose \texttt{SMRC} (\textit{\underline{S}tudent \underline{M}athematical \underline{R}easoning \underline{C}orrection}), a novel method that aligns LLMs with student reasoning. Specifically, \texttt{SMRC} formulates student reasoning as a multi-step sequential decision problem and introduces Monte Carlo Tree Search (MCTS) to explore optimal correction paths. To reduce the cost of the annotating process-level rewards, we leverage breadth-first search (BFS) guided by LLMs and final-answer evaluation to generate reward signals, which are then distributed across intermediate reasoning steps via a back-propagation mechanism, enabling fine-grained process supervision. Additionally, we construct a benchmark for high school mathematics, MSEB (Multi-Solution Error Benchmark), consisting of 158 instances that include problem statements, student solutions, and correct reasoning steps. We further propose a dual evaluation protocol centered on \textbf{solution accuracy} and \textbf{correct-step retention}, offering a comprehensive measure of educational applicability. Experiments demonstrate that \texttt{SMRC} significantly outperforms existing methods on two public datasets (ProcessBench and MR-GSM8K) and our MSEB in terms of effectiveness and overall performance. The code and data are available at https://github.com/Mind-Lab-ECNU/SMRC.

SMRC: Aligning Large Language Models with Student Reasoning for Mathematical Error Correction

TL;DR

This work tackles the challenge of aligning large language models with student problem-solving processes for mathematical error correction. It introduces SAMRC and a practical SMRC framework that uses Monte Carlo Tree Search guided by a process-supervised reward model, augmented by BFS-derived reward signals, to preserve correct student reasoning while achieving correct solutions. A new real-world dataset, MSEB, captures authentic student solutions and multiple valid strategies, enabling evaluation of both solution accuracy and step retention via the CSRR metric and harmonic mean. Empirical results across ProcessBench, MR-GSM8K, and MSEB show SMRC outperforms baselines and demonstrates strong generality across base models and search strategies, advancing educational applicability of AI-assisted tutoring.

Abstract

Large language models (LLMs) often make reasoning errors when solving mathematical problems, and how to automatically detect and correct these errors has become an important research direction. However, existing approaches \textit{mainly focus on self-correction within the model}, which falls short of the ``teacher-style`` correction required in educational settings, \textit{i.e.}, systematically guiding and revising a student's problem-solving process. To address this gap, we propose \texttt{SMRC} (\textit{\underline{S}tudent \underline{M}athematical \underline{R}easoning \underline{C}orrection}), a novel method that aligns LLMs with student reasoning. Specifically, \texttt{SMRC} formulates student reasoning as a multi-step sequential decision problem and introduces Monte Carlo Tree Search (MCTS) to explore optimal correction paths. To reduce the cost of the annotating process-level rewards, we leverage breadth-first search (BFS) guided by LLMs and final-answer evaluation to generate reward signals, which are then distributed across intermediate reasoning steps via a back-propagation mechanism, enabling fine-grained process supervision. Additionally, we construct a benchmark for high school mathematics, MSEB (Multi-Solution Error Benchmark), consisting of 158 instances that include problem statements, student solutions, and correct reasoning steps. We further propose a dual evaluation protocol centered on \textbf{solution accuracy} and \textbf{correct-step retention}, offering a comprehensive measure of educational applicability. Experiments demonstrate that \texttt{SMRC} significantly outperforms existing methods on two public datasets (ProcessBench and MR-GSM8K) and our MSEB in terms of effectiveness and overall performance. The code and data are available at https://github.com/Mind-Lab-ECNU/SMRC.

Paper Structure

This paper contains 23 sections, 7 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Example of two correction modes: given an input question and student answer (Left), model self-correction detects reasoning errors in LLMs (Middle), while "teacher-style" correction identifies errors in the student's reasoning process (Right).
  • Figure 2: (A) Self-correction methods for LLMs often fail to address students' reasoning processes, relying instead on direct model-generated answers (B) SMRC inputs student questions and initial attempts, using Monte Carlo Tree Search to generate educational corrections; (C) The framework combines outcome and process rewards to train the reward model (D) This method starts from student attempts and provides final corrections through four reasoning steps.
  • Figure 3: MSEB Dataset Construction Pipeline.
  • Figure 4: Initial reasoning tree structure with leaf node value assignments. Green nodes indicate correct answers (+1), red nodes indicate incorrect answers (-1).
  • Figure 6: Reasoning tree after step-level reward propagation. Node values reflect cumulative rewards computed through backtracking from leaf nodes to the root. Green nodes indicate correct answers (+1), red nodes indicate incorrect answers (-1).
  • ...and 5 more figures