ISSR: Iterative Selection with Self-Review for Vocabulary Test Distractor Generation
Yu-Cheng Liu, An-Zi Yen
TL;DR
The paper tackles automatic distractor generation for vocabulary tests by introducing the Iterative Selection with Self-Review (ISSR) framework. ISSR modularizes the process into a candidate generator (PLM-based), a distractor selector (LLM-based), and a validator with a self-review mechanism to ensure a single correct answer. Empirical results on GSAT vocabulary items show ISSR outperforms baselines and that a well-curated candidate pool plus binary-choice self-review yields the best validity and quality. The approach reduces teacher effort, remains adaptable across exams, and avoids extensive fine-tuning, though it requires more computation and hinges on the quality of candidate generation. The work highlights practical pathways for leveraging LLMs in test item design while identifying areas (e.g., polysemy handling, generalization) for future improvements.
Abstract
Vocabulary acquisition is essential to second language learning, as it underpins all core language skills. Accurate vocabulary assessment is particularly important in standardized exams, where test items evaluate learners' comprehension and contextual use of words. Previous research has explored methods for generating distractors to aid in the design of English vocabulary tests. However, current approaches often rely on lexical databases or predefined rules, and frequently produce distractors that risk invalidating the question by introducing multiple correct options. In this study, we focus on English vocabulary questions from Taiwan's university entrance exams. We analyze student response distributions to gain insights into the characteristics of these test items and provide a reference for future research. Additionally, we identify key limitations in how large language models (LLMs) support teachers in generating distractors for vocabulary test design. To address these challenges, we propose the iterative selection with self-review (ISSR) framework, which makes use of a novel LLM-based self-review mechanism to ensure that the distractors remain valid while offering diverse options. Experimental results show that ISSR achieves promising performance in generating plausible distractors, and the self-review mechanism effectively filters out distractors that could invalidate the question.
