Enhancing Programming Error Messages in Real Time with Generative AI
Bailey Kimmel, Austin Geisert, Lily Yaro, Brendan Gipson, Taylor Hotchkiss, Sidney Osae-Asante, Hunter Vaught, Grant Wininger, Chase Yamaguchi
TL;DR
The paper investigates real-time GPT-4 feedback integrated into the Athene automated assessment tool to augment programming error messages across compile, run-time, and logic errors in a CS1 course. Using a PHP plug-in, submissions to the Prime Factorization assignment were annotated with hint-based AI feedback while avoiding code disclosure, and data were collected from submission logs and two rounds of student surveys (n=52 participants). Results show that AI feedback did not automatically improve outcomes; mean submissions increased in the studied semester, and student perceptions were mixed, with concerns about vagueness and occasional inaccuracies and a preference for interactive, follow-up dialogue. The study highlights the importance of interface design and conversational capabilities for effective AI-assisted feedback in CS education and offers guidance for future tool development to balance support with student autonomy.
Abstract
Generative AI is changing the way that many disciplines are taught, including computer science. Researchers have shown that generative AI tools are capable of solving programming problems, writing extensive blocks of code, and explaining complex code in simple terms. Particular promise has been shown in using generative AI to enhance programming error messages. Both students and instructors have complained for decades that these messages are often cryptic and difficult to understand. Yet recent work has shown that students make fewer repeated errors when enhanced via GPT-4. We extend this work by implementing feedback from ChatGPT for all programs submitted to our automated assessment tool, Athene, providing help for compiler, run-time, and logic errors. Our results indicate that adding generative AI to an automated assessment tool does not necessarily make it better and that design of the interface matters greatly to the usability of the feedback that GPT-4 provided.
