Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

Fanyi Qu; Hao Sun; Yunfang Wu

Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

Fanyi Qu, Hao Sun, Yunfang Wu

TL;DR

This paper tackles the distractor generation problem in reading comprehension under an unsupervised setting, removing the need for labor-intensive distractor labels. It introduces an LLM distillation framework that uses pseudo distractors from large models to train a compact Bart-base student via a two-stage dual-task training scheme. To boost distractor quality, it presents counterfactual contrastive decoding with a plausibility constraint, guiding the model toward counterfactual yet plausible distractors. Empirical results on RACE and Dream show the unsupervised method surpasses zero-shot LLM baselines and approaches fully supervised models while using far fewer parameters, demonstrating a cost-efficient path for real-world reading comprehension systems. Overall, the approach offers a scalable data-generation solution for DG without heavy annotation or large-scale models.

Abstract

Within the context of reading comprehension, the task of Distractor Generation (DG) aims to generate several incorrect options to confuse readers. Traditional supervised methods for DG rely heavily on expensive human-annotated distractor labels. In this paper, we propose an unsupervised DG framework, leveraging Large Language Models (LLMs) as cost-effective annotators to enhance the DG capability of smaller student models. Specially, to perform knowledge distilling, we propose a dual task training strategy that integrates pseudo distractors from LLMs and the original answer in-formation as the objective targets with a two-stage training process. Moreover, we devise a counterfactual contrastive decoding mechanism for increasing the distracting capability of the DG model. Experiments show that our unsupervised generation method with Bart-base greatly surpasses GPT-3.5-turbo performance with only 200 times fewer model parameters. Our proposed unsupervised DG method offers a cost-effective framework for practical reading comprehension applications, without the need of laborious distractor annotation and costly large-size models

Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

TL;DR

Abstract

Paper Structure (37 sections, 8 equations, 3 figures, 13 tables)

This paper contains 37 sections, 8 equations, 3 figures, 13 tables.

Introduction
Related Work
Distractor Generation
LLM Knowledge Distillation
Method
Task Definition
Generating Distractors with LLMs
Dual Task Training with Student Models
Contrastive Decoding
Counterfactual Contrastive Decoding
Stage 1
Stage 2
Plausibility Constraint
Experimental Setup
Student Model Training
...and 22 more sections

Figures (3)

Figure 1: Performance of different unsupervised methods generating 3 distractors on two datasets. We also display results with the supervised Bart-base model for comparison.
Figure 2: Overview of our proposed unsupervised distractor generation framework, which can be divided into two parts: pseudo distractor generation and dual task training.
Figure 3: Low-resource experimental results on RACE.

Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

TL;DR

Abstract

Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (3)