Table of Contents
Fetching ...

MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support

Wei-Ling Hsu, Yu-Chien Tang, An-Zi Yen

TL;DR

The paper addresses trustworthy AI-assisted feedback in mathematics education by introducing MathEDU, a dataset of authentic student problem-solving processes paired with teacher feedback. It defines three hierarchical tasks—answer correctness classification, error identification, and feedback generation—and evaluates prompting and LoRA-based fine-tuning across multiple LLMs. Results show that fine-tuning improves correctness judgments and error localization, but generating targeted, pedagogically useful feedback remains difficult and frequently diverges from teacher guidance. The work highlights the need for pedagogy-aware AI systems and larger, more diverse datasets to bridge the gap between model outputs and effective classroom feedback, with practical implications for scalable math tutoring and assessment.

Abstract

The increasing reliance on Large Language Models (LLMs) across various domains extends to education, where students progressively use generative AI as a tool for learning. While prior work has examined LLMs' mathematical ability, their reliability in grading authentic student problem-solving processes and delivering effective feedback remains underexplored. This study introduces MathEDU, a dataset consisting of student problem-solving processes in mathematics and corresponding teacher-written feedback. We systematically evaluate the reliability of various models across three hierarchical tasks: answer correctness classification, error identification, and feedback generation. Experimental results show that fine-tuning strategies effectively improve performance in classifying correctness and locating erroneous steps. However, the generated feedback across models shows a considerable gap from teacher-written feedback. Critically, the generated feedback is often verbose and fails to provide targeted explanations for the student's underlying misconceptions. This emphasizes the urgent need for trustworthy and pedagogy-aware AI feedback in education.

MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support

TL;DR

The paper addresses trustworthy AI-assisted feedback in mathematics education by introducing MathEDU, a dataset of authentic student problem-solving processes paired with teacher feedback. It defines three hierarchical tasks—answer correctness classification, error identification, and feedback generation—and evaluates prompting and LoRA-based fine-tuning across multiple LLMs. Results show that fine-tuning improves correctness judgments and error localization, but generating targeted, pedagogically useful feedback remains difficult and frequently diverges from teacher guidance. The work highlights the need for pedagogy-aware AI systems and larger, more diverse datasets to bridge the gap between model outputs and effective classroom feedback, with practical implications for scalable math tutoring and assessment.

Abstract

The increasing reliance on Large Language Models (LLMs) across various domains extends to education, where students progressively use generative AI as a tool for learning. While prior work has examined LLMs' mathematical ability, their reliability in grading authentic student problem-solving processes and delivering effective feedback remains underexplored. This study introduces MathEDU, a dataset consisting of student problem-solving processes in mathematics and corresponding teacher-written feedback. We systematically evaluate the reliability of various models across three hierarchical tasks: answer correctness classification, error identification, and feedback generation. Experimental results show that fine-tuning strategies effectively improve performance in classifying correctness and locating erroneous steps. However, the generated feedback across models shows a considerable gap from teacher-written feedback. Critically, the generated feedback is often verbose and fails to provide targeted explanations for the student's underlying misconceptions. This emphasizes the urgent need for trustworthy and pedagogy-aware AI feedback in education.

Paper Structure

This paper contains 32 sections, 1 equation, 4 figures, 19 tables.

Figures (4)

  • Figure 1: Example of Model Providing Lengthy but Inappropriate Feedback to the Student's Solution.
  • Figure 2: Performance of six students across six subject categories (General, Gain, Physics, Geometry, Probability, and Other). Scores are expressed as percentages, with higher values indicating better correctness.
  • Figure 3: Average numbers of equations and words used by six students in their problem-solving processes.
  • Figure 4: Feedback Categories by Models