Evaluating Contextually Personalized Programming Exercises Created with Generative AI

Evanfiya Logacheva; Arto Hellas; James Prather; Sami Sarsa; Juho Leinonen

Evaluating Contextually Personalized Programming Exercises Created with Generative AI

Evanfiya Logacheva, Arto Hellas, James Prather, Sami Sarsa, Juho Leinonen

TL;DR

The paper investigates generating contextually personalized programming exercises with GPT-4 for an elective Dart course, evaluating quality via expert rubrics and student feedback, and analyzing how students interact with the AI-generated material. The approach combines prompt engineering, a Dart-based generation pipeline, and an integrated tool that presents exercises with theming and difficulty controls, assessing authenticity, usefulness, and engagement. Findings show generally high-quality content with strong theme/topic alignment and positive student reception, though many exercises are only shallowly personalized and some difficulty matches are imperfect. The work demonstrates that AI-generated personalized exercises can be a scalable, engaging supplement for introductory programming education, with implications for instructional design and future enhancements such as on-demand generation and deeper personalization.

Abstract

Programming skills are typically developed through completing various hands-on exercises. Such programming problems can be contextualized to students' interests and cultural backgrounds. Prior research in educational psychology has demonstrated that context personalization of exercises stimulates learners' situational interests and positively affects their engagement. However, creating a varied and comprehensive set of programming exercises for students to practice on is a time-consuming and laborious task for computer science educators. Previous studies have shown that large language models can generate conceptually and contextually relevant programming exercises. Thus, they offer a possibility to automatically produce personalized programming problems to fit students' interests and needs. This article reports on a user study conducted in an elective introductory programming course that included contextually personalized programming exercises created with GPT-4. The quality of the exercises was evaluated by both the students and the authors. Additionally, this work investigated student attitudes towards the created exercises and their engagement with the system. The results demonstrate that the quality of exercises generated with GPT-4 was generally high. What is more, the course participants found them engaging and useful. This suggests that AI-generated programming problems can be a worthwhile addition to introductory programming courses, as they provide students with a practically unlimited pool of practice material tailored to their personal interests and educational needs.

Evaluating Contextually Personalized Programming Exercises Created with Generative AI

TL;DR

Abstract

Paper Structure (23 sections, 12 figures, 8 tables)

This paper contains 23 sections, 12 figures, 8 tables.

Introduction
Related Work
Context Personalization
Perceptions of Assessment Quality
Large Language Models in Computing Education
Automatic Programming Exercise Generation
Methods
Prompt Engineering
Exercise Generation
Study Context
Tool
Data Collection
Approach
RQ1: Expert Evaluation by the Study Authors
RQ2: Student Evaluation
...and 8 more sections

Figures (12)

Figure 1: A screenshot of nonsensical output containing emojis.
Figure 2: A screenshot of the exercise selection functionality with a problem description shown. In the screenshot, the user has already completed two exercises. The user has selected "board games" as the theme, "arithmetics" as the concept, and "normal" as the difficulty, and then pressed the "Get Exercise"-button. The button is labeled as "Get New Exercise" as an exercise has already been retrieved. In this example, an exercise about Carcassonne has been retrieved.
Figure 3: An exercise example with overly advanced concepts.
Figure 4: An exercise containing a factual error.
Figure 5: An exercise with shallow personalization.
...and 7 more figures

Evaluating Contextually Personalized Programming Exercises Created with Generative AI

TL;DR

Abstract

Evaluating Contextually Personalized Programming Exercises Created with Generative AI

TL;DR

Abstract

Table of Contents

Figures (12)