Table of Contents
Fetching ...

ConQuer: A Framework for Concept-Based Quiz Generation

Yicheng Fu, Zikui Wang, Liuxin Yang, Meiqing Huo, Zhongdongming Dai

TL;DR

ConQuer addresses the quality gap in AI-generated quizzes by grounding content in external knowledge through concept extraction and retrieval from sources such as Wikipedia and ConceptNet. The framework integrates retrieval-augmented generation, summarization of retrieved material, and a concept-aware quiz generator, with evaluation by a large language model. It demonstrates a 4.8% improvement in quiz quality and a 77.52% win rate in pairwise comparisons against baselines, with ablation studies confirming the importance of each component. The work provides an open-source pipeline and dataset, enabling scalable, pedagogically grounded quiz generation across diverse subjects and education levels.

Abstract

Quizzes play a crucial role in education by reinforcing students' understanding of key concepts and encouraging self-directed exploration. However, compiling high-quality quizzes can be challenging and require deep expertise and insight into specific subject matter. Although LLMs have greatly enhanced the efficiency of quiz generation, concerns remain regarding the quality of these AI-generated quizzes and their educational impact on students. To address these issues, we introduce ConQuer, a concept-based quiz generation framework that leverages external knowledge sources. We employ comprehensive evaluation dimensions to assess the quality of the generated quizzes, using LLMs as judges. Our experiment results demonstrate a 4.8% improvement in evaluation scores and a 77.52% win rate in pairwise comparisons against baseline quiz sets. Ablation studies further underscore the effectiveness of each component in our framework. Code available at https://github.com/sofyc/ConQuer.

ConQuer: A Framework for Concept-Based Quiz Generation

TL;DR

ConQuer addresses the quality gap in AI-generated quizzes by grounding content in external knowledge through concept extraction and retrieval from sources such as Wikipedia and ConceptNet. The framework integrates retrieval-augmented generation, summarization of retrieved material, and a concept-aware quiz generator, with evaluation by a large language model. It demonstrates a 4.8% improvement in quiz quality and a 77.52% win rate in pairwise comparisons against baselines, with ablation studies confirming the importance of each component. The work provides an open-source pipeline and dataset, enabling scalable, pedagogically grounded quiz generation across diverse subjects and education levels.

Abstract

Quizzes play a crucial role in education by reinforcing students' understanding of key concepts and encouraging self-directed exploration. However, compiling high-quality quizzes can be challenging and require deep expertise and insight into specific subject matter. Although LLMs have greatly enhanced the efficiency of quiz generation, concerns remain regarding the quality of these AI-generated quizzes and their educational impact on students. To address these issues, we introduce ConQuer, a concept-based quiz generation framework that leverages external knowledge sources. We employ comprehensive evaluation dimensions to assess the quality of the generated quizzes, using LLMs as judges. Our experiment results demonstrate a 4.8% improvement in evaluation scores and a 77.52% win rate in pairwise comparisons against baseline quiz sets. Ablation studies further underscore the effectiveness of each component in our framework. Code available at https://github.com/sofyc/ConQuer.

Paper Structure

This paper contains 18 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The ConQuer Framework. First, key concepts are extracted from student questions, followed by retrieving relevant information from external knowledge sources based on semantic similarity. Finally, the main topics are summarized to generate personalized quizzes.
  • Figure 2: Student Question Difficulty Vs. Area
  • Figure 3: Student Question Difficulty Vs. Education Level
  • Figure 4: Evaluation score comparison between the baseline and ConQuer with GPT-4o-mini. The evaluation score has been normalized to a scale of 100.
  • Figure 5: Win rate from pairwise comparison between the baseline and ConQuer with GPT-4o-mini
  • ...and 3 more figures