Table of Contents
Fetching ...

DGRC: An Effective Fine-tuning Framework for Distractor Generation in Chinese Multi-choice Reading Comprehension

Runfeng Lin, Dacheng Xu, Huijiang Wang, Zebiao Chen, Yating Wang, Shouqiang Liu

TL;DR

This work tackles the challenge of generating plausible distractors for Chinese natural-questions in multi-choice reading comprehension by introducing DGRC, a fine-tuning framework that combines hard chain-of-thought, multi-task learning, and generation mask patterns. DGRC is evaluated on authentic Chinese exam datasets (C^3 and Logiqa) and shows substantial gains, with BLEU scores improving by more than 2.5× and human evaluation favoring DGRC-produced distractors. The analysis reveals that each component—hard CoT, multi-task learning, and end-to-end masking—contributes to improved distractor quality, with ablation studies highlighting their relative importance. The approach advances NQDG in Chinese settings and provides a practical framework for producing exam-aligned distractors in standardized testing contexts.

Abstract

When evaluating a learner's knowledge proficiency, the multiple-choice question is an efficient and widely used format in standardized tests. Nevertheless, generating these questions, particularly plausible distractors (incorrect options), poses a considerable challenge. Generally, the distractor generation can be classified into cloze-style distractor generation (CDG) and natural questions distractor generation (NQDG). In contrast to the CDG, utilizing pre-trained language models (PLMs) for NQDG presents three primary challenges: (1) PLMs are typically trained to generate ``correct'' content, like answers, while rarely trained to generate ``plausible" content, like distractors; (2) PLMs often struggle to produce content that aligns well with specific knowledge and the style of exams; (3) NQDG necessitates the model to produce longer, context-sensitive, and question-relevant distractors. In this study, we introduce a fine-tuning framework named DGRC for NQDG in Chinese multi-choice reading comprehension from authentic examinations. DGRC comprises three major components: hard chain-of-thought, multi-task learning, and generation mask patterns. The experiment results demonstrate that DGRC significantly enhances generation performance, achieving a more than 2.5-fold improvement in BLEU scores.

DGRC: An Effective Fine-tuning Framework for Distractor Generation in Chinese Multi-choice Reading Comprehension

TL;DR

This work tackles the challenge of generating plausible distractors for Chinese natural-questions in multi-choice reading comprehension by introducing DGRC, a fine-tuning framework that combines hard chain-of-thought, multi-task learning, and generation mask patterns. DGRC is evaluated on authentic Chinese exam datasets (C^3 and Logiqa) and shows substantial gains, with BLEU scores improving by more than 2.5× and human evaluation favoring DGRC-produced distractors. The analysis reveals that each component—hard CoT, multi-task learning, and end-to-end masking—contributes to improved distractor quality, with ablation studies highlighting their relative importance. The approach advances NQDG in Chinese settings and provides a practical framework for producing exam-aligned distractors in standardized testing contexts.

Abstract

When evaluating a learner's knowledge proficiency, the multiple-choice question is an efficient and widely used format in standardized tests. Nevertheless, generating these questions, particularly plausible distractors (incorrect options), poses a considerable challenge. Generally, the distractor generation can be classified into cloze-style distractor generation (CDG) and natural questions distractor generation (NQDG). In contrast to the CDG, utilizing pre-trained language models (PLMs) for NQDG presents three primary challenges: (1) PLMs are typically trained to generate ``correct'' content, like answers, while rarely trained to generate ``plausible" content, like distractors; (2) PLMs often struggle to produce content that aligns well with specific knowledge and the style of exams; (3) NQDG necessitates the model to produce longer, context-sensitive, and question-relevant distractors. In this study, we introduce a fine-tuning framework named DGRC for NQDG in Chinese multi-choice reading comprehension from authentic examinations. DGRC comprises three major components: hard chain-of-thought, multi-task learning, and generation mask patterns. The experiment results demonstrate that DGRC significantly enhances generation performance, achieving a more than 2.5-fold improvement in BLEU scores.
Paper Structure (26 sections, 11 equations, 6 figures, 3 tables)

This paper contains 26 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of the framework: We employ multi-task learning and chain-of-thought (CoT) on the distractor generator approach in the distractor generator, which is based on pre-trained language models. Given the context, question, and answer, the distractor generator generates three distractors sequentially.
  • Figure 2: Illustration of hard CoT.
  • Figure 3: Illustration of few-shot CoT.
  • Figure 4: Templated question (above) and non-templated question (below).
  • Figure 5: Formulation of multi-choice question answering task.
  • ...and 1 more figures