MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
Wei-Ling Hsu, Yu-Chien Tang, An-Zi Yen
TL;DR
The paper addresses trustworthy AI-assisted feedback in mathematics education by introducing MathEDU, a dataset of authentic student problem-solving processes paired with teacher feedback. It defines three hierarchical tasks—answer correctness classification, error identification, and feedback generation—and evaluates prompting and LoRA-based fine-tuning across multiple LLMs. Results show that fine-tuning improves correctness judgments and error localization, but generating targeted, pedagogically useful feedback remains difficult and frequently diverges from teacher guidance. The work highlights the need for pedagogy-aware AI systems and larger, more diverse datasets to bridge the gap between model outputs and effective classroom feedback, with practical implications for scalable math tutoring and assessment.
Abstract
The increasing reliance on Large Language Models (LLMs) across various domains extends to education, where students progressively use generative AI as a tool for learning. While prior work has examined LLMs' mathematical ability, their reliability in grading authentic student problem-solving processes and delivering effective feedback remains underexplored. This study introduces MathEDU, a dataset consisting of student problem-solving processes in mathematics and corresponding teacher-written feedback. We systematically evaluate the reliability of various models across three hierarchical tasks: answer correctness classification, error identification, and feedback generation. Experimental results show that fine-tuning strategies effectively improve performance in classifying correctness and locating erroneous steps. However, the generated feedback across models shows a considerable gap from teacher-written feedback. Critically, the generated feedback is often verbose and fails to provide targeted explanations for the student's underlying misconceptions. This emphasizes the urgent need for trustworthy and pedagogy-aware AI feedback in education.
