Table of Contents
Fetching ...

Constructing Cloze Questions Generatively

Yicheng Sun, Jie Wang

TL;DR

Cloze questions rely on plausible distractors, but existing methods struggle with sense alignment and multigram keys. CQG integrates stem/answer-key selection, instance-level distractor generation with a transformer, sense disambiguation via ESCHER, and WordNet-based lexical constraints, followed by a filtering and ranking ADG module to produce high-quality, contextually appropriate distractors, including multigram ones. Empirical results show CQG outperforms prior SOTA methods on unigram tasks and generates semantically aligned multigram distractors, with human judges affirming quality. The approach offers a scalable path for automated, high-quality distractor generation and has been released with data and an API to foster further research and application in MCQ construction.

Abstract

We present a generative method called CQG for constructing cloze questions from a given article using neural networks and WordNet, with an emphasis on generating multigram distractors. Built on sense disambiguation, text-to-text transformation, WordNet's synset taxonomies and lexical labels, CQG selects an answer key for a given sentence, segments it into a sequence of instances, generates instance-level distractor candidates (IDCs) using a transformer and sibling synsets.It then removes inappropriate IDCs, ranks the remaining IDCs based on contextual embedding similarities, as well as synset and lexical relatedness, forms distractor candidates by combinatorially replacing instances with the corresponding top-ranked IDCs, and checks if they are legitimate phrases. Finally, it selects top-ranked distractor candidates based on contextual semantic similarities to the answer key. Experiments show that this method significantly outperforms SOTA results. Human judges also confirm the high qualities of the generated distractors.

Constructing Cloze Questions Generatively

TL;DR

Cloze questions rely on plausible distractors, but existing methods struggle with sense alignment and multigram keys. CQG integrates stem/answer-key selection, instance-level distractor generation with a transformer, sense disambiguation via ESCHER, and WordNet-based lexical constraints, followed by a filtering and ranking ADG module to produce high-quality, contextually appropriate distractors, including multigram ones. Empirical results show CQG outperforms prior SOTA methods on unigram tasks and generates semantically aligned multigram distractors, with human judges affirming quality. The approach offers a scalable path for automated, high-quality distractor generation and has been released with data and an API to foster further research and application in MCQ construction.

Abstract

We present a generative method called CQG for constructing cloze questions from a given article using neural networks and WordNet, with an emphasis on generating multigram distractors. Built on sense disambiguation, text-to-text transformation, WordNet's synset taxonomies and lexical labels, CQG selects an answer key for a given sentence, segments it into a sequence of instances, generates instance-level distractor candidates (IDCs) using a transformer and sibling synsets.It then removes inappropriate IDCs, ranks the remaining IDCs based on contextual embedding similarities, as well as synset and lexical relatedness, forms distractor candidates by combinatorially replacing instances with the corresponding top-ranked IDCs, and checks if they are legitimate phrases. Finally, it selects top-ranked distractor candidates based on contextual semantic similarities to the answer key. Experiments show that this method significantly outperforms SOTA results. Human judges also confirm the high qualities of the generated distractors.
Paper Structure (25 sections, 6 equations, 5 figures, 5 tables)

This paper contains 25 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: CQG architecture and data flow
  • Figure 2: SAS sub-components and data flow
  • Figure 3: IDG sub-components and data flow
  • Figure 4: ADG sub-components and data flow
  • Figure 5: Distribution of ground-truth distractors predicted