Generative AI for Multiple Choice STEM Assessments
Christina Perdikoulias, Chad Vance, Stephen M. Watt
TL;DR
The paper tackles the challenge of generating high-quality, mathematically valid distractors for STEM multiple-choice assessments using generative AI. It proposes a constrained workflow within the Möbius platform, leveraging a semantic math engine and structured prompts to bound AI outputs, preserve mathematical semantics, and enable robust validation. Key contributions include a detailed platform architecture, a prompt-design strategy for credible distractors, and a multi-layer validation pipeline that checks encoding fidelity, JSON parsing, API error handling, and mathematical correctness. The approach demonstrates that, when integrated with discipline-specific tools and validation, AI can substantially speed up content creation while maintaining academic rigor and assessment validity, with implications for scalable, iterative STEM instruction.
Abstract
Artificial intelligence (AI) technology enables a range of enhancements in computer-aided instruction, from accelerating the creation of teaching materials to customizing learning paths based on learner outcomes. However, ensuring the mathematical accuracy and semantic integrity of generative AI output remains a significant challenge, particularly in Science, Technology, Engineering and Mathematics (STEM) disciplines. In this study, we explore the use of generative AI in which "hallucinations", typically viewed as undesirable inaccuracies, can instead serve a pedagogical purpose. Specifically, we investigate the generation of plausible but incorrect alternatives for multiple choice assessments, where credible distractors are essential for effective assessment design. We describe the Moebius platform for online instruction, with particular focus on its architecture for handling mathematical elements through specialized semantic packages that support dynamic, parameterized STEM content. We examine methods for crafting prompts that interact effectively with these mathematical semantics to guide the AI in generating high-quality multiple choice distractors. Finally, we demonstrate how this approach reduces the time and effort associated with creating robust teaching materials while maintaining academic rigor and assessment validity.
