How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes
Sabina Elkins, Ekaterina Kochmar, Jackie C. K. Cheung, Iulian Serban
TL;DR
Educational question generation (EQG) using large language models (LLMs) can reduce teacher workload while preserving quiz quality. This paper employs GPT-3.5 with two prompting strategies—simple and Bloom's taxonomy–driven controlled prompts—to generate questions from context passages and evaluates them through teacher-centered experiments comparing handwritten, simple, and controlled quizzes. Results show that teachers prefer quizzes incorporating generated questions and that quality remains comparable, with Bloom-aligned control sometimes enhancing usefulness and coverage. The findings support scalable, pedagogically aligned EQG deployment in classrooms while highlighting the need to extend evaluations to more domains and student outcomes.
Abstract
Question generation (QG) is a natural language processing task with an abundance of potential benefits and use cases in the educational domain. In order for this potential to be realized, QG systems must be designed and validated with pedagogical needs in mind. However, little research has assessed or designed QG approaches with the input from real teachers or students. This paper applies a large language model-based QG approach where questions are generated with learning goals derived from Bloom's taxonomy. The automatically generated questions are used in multiple experiments designed to assess how teachers use them in practice. The results demonstrate that teachers prefer to write quizzes with automatically generated questions, and that such quizzes have no loss in quality compared to handwritten versions. Further, several metrics indicate that automatically generated questions can even improve the quality of the quizzes created, showing the promise for large scale use of QG in the classroom setting.
