Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

Han-Cheng Yu; Yu-An Shih; Kin-Man Law; Kai-Yu Hsieh; Yu-Chen Cheng; Hsin-Chih Ho; Zih-An Lin; Wen-Chuan Hsu; Yao-Chung Fan

Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

Han-Cheng Yu, Yu-An Shih, Kin-Man Law, Kai-Yu Hsieh, Yu-Chen Cheng, Hsin-Chih Ho, Zih-An Lin, Wen-Chuan Hsu, Yao-Chung Fan

TL;DR

This work tackles distractor generation for MCQs by introducing Retrieval Augmented Pretraining (RAP) and Knowledge Augmented Generation (KAG). RAP aligns pretraining with the DG task by creating pseudo questions from retrieved sentences based on the correct answer, while KAG injects structured knowledge triplets from Knowledge Graphs during generation. Empirical results on SciQ and MCQ show state-of-the-art improvements in F1@3, with RAP and KG-based augmentations providing substantial gains and cross-domain benefits when pretraining is done on larger, domain-diverse datasets. The findings highlight the value of task-specific pretraining and structured knowledge for generating higher-quality distractors, though combining the two approaches can introduce noise and requires more refined triplet ranking and integration strategies.

Abstract

In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Through experiments with benchmarking datasets, we show that our models significantly outperform the state-of-the-art results. Our best-performing model advances the F1@3 score from 14.80 to 16.47 in MCQ dataset and from 15.92 to 16.50 in Sciq dataset.

Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

TL;DR

Abstract

Paper Structure (33 sections, 4 equations, 3 figures, 6 tables)

This paper contains 33 sections, 4 equations, 3 figures, 6 tables.

Introduction
Related Work
Distractor Generation
Task-specific Pretraining
Knowledge Augmented Generation
Methodology
Retrieval Augmented Pretraining
Alternatives for RAP Training Setting
Boosting RAP with ground-truth distractor
Knowledge Augmented Generation
Retrieving Triplet from KG
Triplet Ranker
KAG Training
Experiment
Dataset
...and 18 more sections

Figures (3)

Figure 1: Retrieved Augmented Pretraining
Figure 2: Knowledge Augmented Generation
Figure 3: Retrieve Triplet from KG: we extract keyword from a given question, answer and candidate set as a entity set $W$ and retrieve relevant triplet set from KG with keyword entities

Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

TL;DR

Abstract

Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

Authors

TL;DR

Abstract

Table of Contents

Figures (3)