Table of Contents
Fetching ...

Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning

Hunter McNichols, Wanyong Feng, Jaewook Lee, Alexander Scarlatos, Digory Smith, Simon Woodhead, Andrew Lan

TL;DR

This work addresses automating two core aspects of math MCQ design: generating plausible distractors and corresponding feedback using large language models within an in-context learning framework. It formalizes the tasks with functions $g^{dis}$ and $g^{fb}$, and leverages kNN-based retrieval of similar MCQs to form few-shot prompts, evaluated with both standard and novel reference-free metrics. Experiments on a real-world dataset of about 1.4K MCQs show that kNN-based prompting substantially outperforms baselines for distractor generation and reveals nuanced strengths and limitations for feedback generation, including the value of zero-shot prompting for reference-free evaluation. The findings highlight opportunities to scale MCQ authoring and offer directions for future work, such as improving student-error alignment and conducting human evaluations to further validate generated content.

Abstract

Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable form of assessment. An important aspect of MCQs is the distractors, i.e., incorrect options that are designed to target specific misconceptions or insufficient knowledge among students. To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers, which has limited scalability. In this work, we explore the task of automated distractor and corresponding feedback message generation in math MCQs using large language models. We establish a formulation of these two tasks and propose a simple, in-context learning-based solution. Moreover, we propose generative AI-based metrics for evaluating the quality of the feedback messages. We conduct extensive experiments on these tasks using a real-world MCQ dataset. Our findings suggest that there is a lot of room for improvement in automated distractor and feedback generation; based on these findings, we outline several directions for future work.

Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning

TL;DR

This work addresses automating two core aspects of math MCQ design: generating plausible distractors and corresponding feedback using large language models within an in-context learning framework. It formalizes the tasks with functions and , and leverages kNN-based retrieval of similar MCQs to form few-shot prompts, evaluated with both standard and novel reference-free metrics. Experiments on a real-world dataset of about 1.4K MCQs show that kNN-based prompting substantially outperforms baselines for distractor generation and reveals nuanced strengths and limitations for feedback generation, including the value of zero-shot prompting for reference-free evaluation. The findings highlight opportunities to scale MCQ authoring and offer directions for future work, such as improving student-error alignment and conducting human evaluations to further validate generated content.

Abstract

Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable form of assessment. An important aspect of MCQs is the distractors, i.e., incorrect options that are designed to target specific misconceptions or insufficient knowledge among students. To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers, which has limited scalability. In this work, we explore the task of automated distractor and corresponding feedback message generation in math MCQs using large language models. We establish a formulation of these two tasks and propose a simple, in-context learning-based solution. Moreover, we propose generative AI-based metrics for evaluating the quality of the feedback messages. We conduct extensive experiments on these tasks using a real-world MCQ dataset. Our findings suggest that there is a lot of room for improvement in automated distractor and feedback generation; based on these findings, we outline several directions for future work.
Paper Structure (23 sections, 6 equations, 6 figures, 11 tables)

This paper contains 23 sections, 6 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Different parts of math MCQs illustrated with an example.
  • Figure 2: Overview of distractor generation with a math MCQ on "compound percentage decrease".
  • Figure 3: Distractor generation zero-shot prompt.
  • Figure 4: Feedback generation zero-shot prompt.
  • Figure 5: Answer adjustment evaluation prompt.
  • ...and 1 more figures