Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration
Han-Cheng Yu, Yu-An Shih, Kin-Man Law, Kai-Yu Hsieh, Yu-Chen Cheng, Hsin-Chih Ho, Zih-An Lin, Wen-Chuan Hsu, Yao-Chung Fan
TL;DR
This work tackles distractor generation for MCQs by introducing Retrieval Augmented Pretraining (RAP) and Knowledge Augmented Generation (KAG). RAP aligns pretraining with the DG task by creating pseudo questions from retrieved sentences based on the correct answer, while KAG injects structured knowledge triplets from Knowledge Graphs during generation. Empirical results on SciQ and MCQ show state-of-the-art improvements in F1@3, with RAP and KG-based augmentations providing substantial gains and cross-domain benefits when pretraining is done on larger, domain-diverse datasets. The findings highlight the value of task-specific pretraining and structured knowledge for generating higher-quality distractors, though combining the two approaches can introduce noise and requires more refined triplet ranking and integration strategies.
Abstract
In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Through experiments with benchmarking datasets, we show that our models significantly outperform the state-of-the-art results. Our best-performing model advances the F1@3 score from 14.80 to 16.47 in MCQ dataset and from 15.92 to 16.50 in Sciq dataset.
