Diversifying Question Generation over Knowledge Base via External Natural Questions
Shasha Guo, Jing Zhang, Xirui Ke, Cuiping Li, Hong Chen
TL;DR
KBQG has traditionally pursued one-shot question quality, but this work argues for expressive diversity that preserves semantics and introduces the Diverse@k metric to quantify top-k per-instance diversity under a ground-truth relevance constraint. It proposes a dual forward/backward framework that injects diverse expressions from external natural questions via two simple pseudo-pair selection strategies, enabling richer paraphrase-like alternatives without relying solely on paraphrase models. Experiments on WebQuestions and PathQuestions show clear diversity gains over PLM baselines and competitive performance with ChatGPT, while maintaining relevant semantics, demonstrating the approach's practical value for downstream QA and evaluation tasks. Overall, the paper advances KBQG by combining external linguistic diversity with reliable data augmentation to yield more human-like, diverse question generation and improved downstream usefulness.
Abstract
Previous methods on knowledge base question generation (KBQG) primarily focus on enhancing the quality of a single generated question. Recognizing the remarkable paraphrasing ability of humans, we contend that diverse texts should convey the same semantics through varied expressions. The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity. Current metrics inadequately assess the above diversity since they calculate the ratio of unique n-grams in the generated question itself, which leans more towards measuring duplication rather than true diversity. Accordingly, we devise a new diversity evaluation metric, which measures the diversity among top-k generated questions for each instance while ensuring their relevance to the ground truth. Clearly, the second challenge is how to enhance diversifying question generation. To address this challenge, we introduce a dual model framework interwoven by two selection strategies to generate diverse questions leveraging external natural questions. The main idea of our dual framework is to extract more diverse expressions and integrate them into the generation model to enhance diversifying question generation. Extensive experiments on widely used benchmarks for KBQG demonstrate that our proposed approach generates highly diverse questions and improves the performance of question answering tasks.
