Evaluating Contextually Personalized Programming Exercises Created with Generative AI
Evanfiya Logacheva, Arto Hellas, James Prather, Sami Sarsa, Juho Leinonen
TL;DR
The paper investigates generating contextually personalized programming exercises with GPT-4 for an elective Dart course, evaluating quality via expert rubrics and student feedback, and analyzing how students interact with the AI-generated material. The approach combines prompt engineering, a Dart-based generation pipeline, and an integrated tool that presents exercises with theming and difficulty controls, assessing authenticity, usefulness, and engagement. Findings show generally high-quality content with strong theme/topic alignment and positive student reception, though many exercises are only shallowly personalized and some difficulty matches are imperfect. The work demonstrates that AI-generated personalized exercises can be a scalable, engaging supplement for introductory programming education, with implications for instructional design and future enhancements such as on-demand generation and deeper personalization.
Abstract
Programming skills are typically developed through completing various hands-on exercises. Such programming problems can be contextualized to students' interests and cultural backgrounds. Prior research in educational psychology has demonstrated that context personalization of exercises stimulates learners' situational interests and positively affects their engagement. However, creating a varied and comprehensive set of programming exercises for students to practice on is a time-consuming and laborious task for computer science educators. Previous studies have shown that large language models can generate conceptually and contextually relevant programming exercises. Thus, they offer a possibility to automatically produce personalized programming problems to fit students' interests and needs. This article reports on a user study conducted in an elective introductory programming course that included contextually personalized programming exercises created with GPT-4. The quality of the exercises was evaluated by both the students and the authors. Additionally, this work investigated student attitudes towards the created exercises and their engagement with the system. The results demonstrate that the quality of exercises generated with GPT-4 was generally high. What is more, the course participants found them engaging and useful. This suggests that AI-generated programming problems can be a worthwhile addition to introductory programming courses, as they provide students with a practically unlimited pool of practice material tailored to their personal interests and educational needs.
