Table of Contents
Fetching ...

Learning to Make MISTAKEs: Modeling Incorrect Student Thinking And Key Errors

Alexis Ross, Jacob Andreas

TL;DR

MISTAKE addresses the need to model incorrect student thinking by learning from unsupervised, cycle-consistent data that links misconceptions, faulty reasoning, and incorrect answers. The approach fuses an inner data-generation loop (mistake-Generate) with an outer iterative training loop (mistake-Update) to produce two models: a student simulator and a misconception inference model. Across three educational tasks on the EEDI dataset, Mistake improves student-simulation accuracy, misconception-inference MAP@k, and the realism of distractors, notably benefiting from the cycle-consistency checks. The work demonstrates that explicit modeling of incorrect reasoning yields tangible benefits for educational AI, offering a path toward realistic student simulators and targeted feedback in tutoring settings.

Abstract

Research on reasoning in language models (LMs) predominantly focuses on improving the correctness of their outputs. But some important applications require modeling reasoning patterns that are incorrect. For example, automated systems that can reason about and simulate student errors are useful for providing real-time feedback in the classroom or offline practice for educators-in-training. This paper presents a new method, MISTAKE, that (1) constructs high-quality synthetic examples of reasoning errors by leveraging cycle consistency between incorrect answers and latent misconceptions; and (2) uses the generated data to learn models for student simulation, misconception classification, and answer generation. We evaluate MISTAKE on three educational tasks and find that it results in (1) higher accuracy when simulating incorrect student answers based on specific misconceptions, (2) increased performance inferring latent misconceptions from observed incorrect answers, and (3) higher alignment with expert-written distractor answers when generating incorrect answers (e.g., for multiple-choice tests).

Learning to Make MISTAKEs: Modeling Incorrect Student Thinking And Key Errors

TL;DR

MISTAKE addresses the need to model incorrect student thinking by learning from unsupervised, cycle-consistent data that links misconceptions, faulty reasoning, and incorrect answers. The approach fuses an inner data-generation loop (mistake-Generate) with an outer iterative training loop (mistake-Update) to produce two models: a student simulator and a misconception inference model. Across three educational tasks on the EEDI dataset, Mistake improves student-simulation accuracy, misconception-inference MAP@k, and the realism of distractors, notably benefiting from the cycle-consistency checks. The work demonstrates that explicit modeling of incorrect reasoning yields tangible benefits for educational AI, offering a path toward realistic student simulators and targeted feedback in tutoring settings.

Abstract

Research on reasoning in language models (LMs) predominantly focuses on improving the correctness of their outputs. But some important applications require modeling reasoning patterns that are incorrect. For example, automated systems that can reason about and simulate student errors are useful for providing real-time feedback in the classroom or offline practice for educators-in-training. This paper presents a new method, MISTAKE, that (1) constructs high-quality synthetic examples of reasoning errors by leveraging cycle consistency between incorrect answers and latent misconceptions; and (2) uses the generated data to learn models for student simulation, misconception classification, and answer generation. We evaluate MISTAKE on three educational tasks and find that it results in (1) higher accuracy when simulating incorrect student answers based on specific misconceptions, (2) increased performance inferring latent misconceptions from observed incorrect answers, and (3) higher alignment with expert-written distractor answers when generating incorrect answers (e.g., for multiple-choice tests).

Paper Structure

This paper contains 25 sections, 1 equation, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Examples of mathematical errors that result from common misconceptions shared among students.
  • Figure 2: Overview of mistake. mistake-Generate generates data by enforcing cycle consistency between misconceptions, reasoning traces, and answers. mistake-Update iteratively trains student simulation and misconception inference models on this data, generates new data using mistake-Generate and these models, and repeats.
  • Figure 3: Results on the three educational tasks described in §\ref{['ssec:tasks']}. We report means and standard errors across 5 random seeds. (a) Student simulation accuracies of mistake variants (§\ref{['sec:results_student_simulation']}) (test set). (b) Misconception inference results for mistake variants (test set) (§\ref{['sec:results_misconception_inference']}). (c) Precision of generated distractor answers for mistake-cycle+correct (validation set) (§\ref{['sec:results_distractor_generation']}).