DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction

Hao Chen; Biaojie Zeng; Xin Lin; Liang He; Aimin Zhou

DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction

Hao Chen, Biaojie Zeng, Xin Lin, Liang He, Aimin Zhou

TL;DR

The paper investigates whether mathematical reasoning and error correction are equivalent abilities in large language models and introduces themath world problems correction (MWPC) task to study correction. It demonstrates, through comparative experiments across diverse LLMs, that models can solve problems without correcting mistakes and vice versa, highlighting a gap between reasoning and correcting. To address this, it proposes Diagnostic-Oriented Prompting (DOP), which leverages external educational resources at three levels (NA, BE, SA) to enhance correction capabilities, and shows that DOP consistently improves correction performance over standard prompting. The work emphasizes the importance of robust error correction in math education and provides datasets, code, and a framework to train and evaluate LLMs as effective mathematical tutors.

Abstract

Math world problems correction(MWPC) is a novel task dedicated to rectifying reasoning errors in the process of solving mathematical problems. In this paper, leveraging the advancements in large language models (LLMs), we address two key objectives:(1) Distinguishing between mathematical reasoning and error correction; (2) Exploring strategies to enhance the error correction capabilities of LLMs in mathematics to solve MWPC task. We noticed that, in real-time education,assisting students in recognizing their mistakes is more crucial than simply providing correct answers. However, current research tends to prioritize obtaining accurate solutions to math problems rather than correcting potentially incorrect ones. Therefore, we modify the research paradigm, demonstrating that improving mathematical reasoning abilities does not equate to mastery in error correction. Meanwhile, we propose a novel method called diagnostic-oriented promping(DOP) aimed at facilitating LLMs to excel in error correction. In experiments, DOP has shown outstanding performance, highlighting its significant impact. We argue that in mathematical education, the demand for outstanding correctors surpasses that for proficient reasoners. Codes and data are available on https://github.com/ChenhaoEcnuCS/Reason-Correct.

DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction

TL;DR

Abstract

Paper Structure (15 sections, 5 equations, 5 figures, 2 tables)

This paper contains 15 sections, 5 equations, 5 figures, 2 tables.

Introduction
Background and Related Work
Mathematical Reasoning Through LLMs
Corrrection Throught LLMs
AI For Mathematical Education
Methodology
Validating Differences between Reasoning and Correcting
Diagnostic-Oriented Prompting(DOP)
Expriments and Analysis
Experiment Setup
Results and Analysis
Comparative Experiments for Validation.
Diagnostic-Oriented Prompting(DOP)
Conclusions
Limitations and Future Work

Figures (5)

Figure 1: Examples of reasoning and correcting.
Figure 2: The overall framework of our research. In the first stage, we conduct both MWPS and MWPC tasks on our candidate models and prove that mathematical reasoning and correcting capabilities are not fully equivalent. Then, in the second stage, we conduct our strategy called Diagnostic-Oriented Prompting(DOP), enabling our candidate models to enhance their correcting abilities in mathematical domain.
Figure 3: An example of different levels of DOP.
Figure 4: Results of $E_r$ and $E_c$. We represents the candidate models using the first letters. For example, 'M-M-7b' means MetaMath-Mistral-7b.
Figure 5: Experiment results of DOP. We recorded the success rates of error correction under different scenarios and visualized them as bar charts.

DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction

TL;DR

Abstract

DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction

Authors

TL;DR

Abstract

Table of Contents

Figures (5)