Connecting Feedback to Choice: Understanding Educator Preferences in GenAI vs. Human-Created Lesson Plans in K-12 Education -- A Comparative Analysis
Shawon Sarkar, Min Sun, Alex Liu, Zewei Tian, Lief Esbenshade, Jian He, Zachary Zhang
TL;DR
The paper investigates educator preferences for GenAI-generated versus human-authored K-12 math lesson plans, using a large-scale, blinded, pairwise preference study across elementary to high school levels. It compares customized GPT-4 outputs, a domain-tuned LLaMA-2-13b model, and human designers across four instructional components (warm-up, main tasks, cool-down, overall quality) and reports nuanced grade-level differences. Findings show human-authored plans remain generally preferred, especially in elementary education, but GenAI approaches become more competitive for higher-grade content and cool-down tasks, with domain-specific fine-tuning further enhancing performance. The study argues for human–AI collaboration in curriculum design, emphasizing targeted tuning, educator feedback loops, and careful integration to balance efficiency with pedagogical depth and inclusivity.
Abstract
As generative AI (GenAI) models are increasingly explored for educational applications, understanding educator preferences for AI-generated lesson plans is critical for their effective integration into K-12 instruction. This exploratory study compares lesson plans authored by human curriculum designers, a fine-tuned LLaMA-2-13b model trained on K-12 content, and a customized GPT-4 model to evaluate their pedagogical quality across multiple instructional measures: warm-up activities, main tasks, cool-down activities, and overall quality. Using a large-scale preference study with K-12 math educators, we examine how preferences vary across grade levels and instructional components. We employ both qualitative and quantitative analyses. The raw preference results indicate that human-authored lesson plans are generally favored, particularly for elementary education, where educators emphasize student engagement, scaffolding, and collaborative learning. However, AI-generated models demonstrate increasing competitiveness in cool-down tasks and structured learning activities, particularly in high school settings. Beyond quantitative results, we conduct thematic analysis using LDA and manual coding to identify key factors influencing educator preferences. Educators value human-authored plans for their nuanced differentiation, real-world contextualization, and student discourse facilitation. Meanwhile, AI-generated lesson plans are often praised for their structure and adaptability for specific instructional tasks. Findings suggest a human-AI collaborative approach to lesson planning, where GenAI can serve as an assistive tool rather than a replacement for educator expertise in lesson planning. This study contributes to the growing discourse on responsible AI integration in education, highlighting both opportunities and challenges in leveraging GenAI for curriculum development.
