Table of Contents
Fetching ...

Difficulty-Controllable Cloze Question Distractor Generation

Seokhoon Kang, Yejin Jeon, Seonjeong Hwang, Gary Geunbae Lee

TL;DR

The paper tackles the challenge of generating distractors with controllable difficulty for cloze questions by presenting a two-stage data augmentation pipeline (two-way candidate generation, filtering, and difficulty clustering) and a multitask learning framework (main DCDG task plus ASDE and DDDE auxiliary tasks). The augmented data expands CLOTH and enables explicit difficulty control, while the multitask model learns to generate distractors that align with predefined difficulty levels and distinguish them from correct answers. Automatic and human evaluations show the approach achieves higher alignment with human-perceived difficulty and lower invalid distractor rates than GPT-4o, with robust performance across different models and datasets. The work provides a scalable, teacher-centric tool for adaptive assessment in education and releases both data and models to support further research.

Abstract

Multiple-choice cloze questions are commonly used to assess linguistic proficiency and comprehension. However, generating high-quality distractors remains challenging, as existing methods often lack adaptability and control over difficulty levels, and the absence of difficulty-annotated datasets further hinders progress. To address these issues, we propose a novel framework for generating distractors with controllable difficulty by leveraging both data augmentation and a multitask learning strategy. First, to create a high-quality, difficulty-annotated dataset, we introduce a two-way distractor generation process in order to produce diverse and plausible distractors. These candidates are subsequently refined through filtering and then categorized by difficulty using an ensemble QA system. Second, this newly created dataset is leveraged to train a difficulty-controllable generation model via multitask learning. The framework includes carefully designed auxiliary tasks that enhance the model's semantic understanding of distractors and its ability to estimate their difficulty. Experimental results demonstrate that our method generates high-quality distractors across difficulty levels and substantially outperforms GPT-4o in aligning distractor difficulty with human perception.

Difficulty-Controllable Cloze Question Distractor Generation

TL;DR

The paper tackles the challenge of generating distractors with controllable difficulty for cloze questions by presenting a two-stage data augmentation pipeline (two-way candidate generation, filtering, and difficulty clustering) and a multitask learning framework (main DCDG task plus ASDE and DDDE auxiliary tasks). The augmented data expands CLOTH and enables explicit difficulty control, while the multitask model learns to generate distractors that align with predefined difficulty levels and distinguish them from correct answers. Automatic and human evaluations show the approach achieves higher alignment with human-perceived difficulty and lower invalid distractor rates than GPT-4o, with robust performance across different models and datasets. The work provides a scalable, teacher-centric tool for adaptive assessment in education and releases both data and models to support further research.

Abstract

Multiple-choice cloze questions are commonly used to assess linguistic proficiency and comprehension. However, generating high-quality distractors remains challenging, as existing methods often lack adaptability and control over difficulty levels, and the absence of difficulty-annotated datasets further hinders progress. To address these issues, we propose a novel framework for generating distractors with controllable difficulty by leveraging both data augmentation and a multitask learning strategy. First, to create a high-quality, difficulty-annotated dataset, we introduce a two-way distractor generation process in order to produce diverse and plausible distractors. These candidates are subsequently refined through filtering and then categorized by difficulty using an ensemble QA system. Second, this newly created dataset is leveraged to train a difficulty-controllable generation model via multitask learning. The framework includes carefully designed auxiliary tasks that enhance the model's semantic understanding of distractors and its ability to estimate their difficulty. Experimental results demonstrate that our method generates high-quality distractors across difficulty levels and substantially outperforms GPT-4o in aligning distractor difficulty with human perception.

Paper Structure

This paper contains 41 sections, 9 figures, 16 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of the dataset augmentation pipeline.
  • Figure 2: Overview of training methods of difficulty-controllable cloze distractor generation model. Differences in the template are highlighted in red.
  • Figure 3: QA system annotation scores across two difficulty levels. The line represents the continuous distribution of the data using KDE.
  • Figure 4: Duplication rate of generated distractors with the correct answers.
  • Figure 5: Histogram difference of QA ensemble system scores between two generation methods. Positive values (blue) indicate higher frequencies for the answer generator, while negative values (red) indicate higher frequencies for the distractor generator.
  • ...and 4 more figures