DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction
Hao Chen, Biaojie Zeng, Xin Lin, Liang He, Aimin Zhou
TL;DR
The paper investigates whether mathematical reasoning and error correction are equivalent abilities in large language models and introduces themath world problems correction (MWPC) task to study correction. It demonstrates, through comparative experiments across diverse LLMs, that models can solve problems without correcting mistakes and vice versa, highlighting a gap between reasoning and correcting. To address this, it proposes Diagnostic-Oriented Prompting (DOP), which leverages external educational resources at three levels (NA, BE, SA) to enhance correction capabilities, and shows that DOP consistently improves correction performance over standard prompting. The work emphasizes the importance of robust error correction in math education and provides datasets, code, and a framework to train and evaluate LLMs as effective mathematical tutors.
Abstract
Math world problems correction(MWPC) is a novel task dedicated to rectifying reasoning errors in the process of solving mathematical problems. In this paper, leveraging the advancements in large language models (LLMs), we address two key objectives:(1) Distinguishing between mathematical reasoning and error correction; (2) Exploring strategies to enhance the error correction capabilities of LLMs in mathematics to solve MWPC task. We noticed that, in real-time education,assisting students in recognizing their mistakes is more crucial than simply providing correct answers. However, current research tends to prioritize obtaining accurate solutions to math problems rather than correcting potentially incorrect ones. Therefore, we modify the research paradigm, demonstrating that improving mathematical reasoning abilities does not equate to mastery in error correction. Meanwhile, we propose a novel method called diagnostic-oriented promping(DOP) aimed at facilitating LLMs to excel in error correction. In experiments, DOP has shown outstanding performance, highlighting its significant impact. We argue that in mathematical education, the demand for outstanding correctors surpasses that for proficient reasoners. Codes and data are available on https://github.com/ChenhaoEcnuCS/Reason-Correct.
