DGRC: An Effective Fine-tuning Framework for Distractor Generation in Chinese Multi-choice Reading Comprehension
Runfeng Lin, Dacheng Xu, Huijiang Wang, Zebiao Chen, Yating Wang, Shouqiang Liu
TL;DR
This work tackles the challenge of generating plausible distractors for Chinese natural-questions in multi-choice reading comprehension by introducing DGRC, a fine-tuning framework that combines hard chain-of-thought, multi-task learning, and generation mask patterns. DGRC is evaluated on authentic Chinese exam datasets (C^3 and Logiqa) and shows substantial gains, with BLEU scores improving by more than 2.5× and human evaluation favoring DGRC-produced distractors. The analysis reveals that each component—hard CoT, multi-task learning, and end-to-end masking—contributes to improved distractor quality, with ablation studies highlighting their relative importance. The approach advances NQDG in Chinese settings and provides a practical framework for producing exam-aligned distractors in standardized testing contexts.
Abstract
When evaluating a learner's knowledge proficiency, the multiple-choice question is an efficient and widely used format in standardized tests. Nevertheless, generating these questions, particularly plausible distractors (incorrect options), poses a considerable challenge. Generally, the distractor generation can be classified into cloze-style distractor generation (CDG) and natural questions distractor generation (NQDG). In contrast to the CDG, utilizing pre-trained language models (PLMs) for NQDG presents three primary challenges: (1) PLMs are typically trained to generate ``correct'' content, like answers, while rarely trained to generate ``plausible" content, like distractors; (2) PLMs often struggle to produce content that aligns well with specific knowledge and the style of exams; (3) NQDG necessitates the model to produce longer, context-sensitive, and question-relevant distractors. In this study, we introduce a fine-tuning framework named DGRC for NQDG in Chinese multi-choice reading comprehension from authentic examinations. DGRC comprises three major components: hard chain-of-thought, multi-task learning, and generation mask patterns. The experiment results demonstrate that DGRC significantly enhances generation performance, achieving a more than 2.5-fold improvement in BLEU scores.
