Table of Contents
Fetching ...

Using Large Language Models to Enhance Programming Error Messages

Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, Brett A. Becker

TL;DR

The paper investigates using large language models, specifically OpenAI Codex, to enhance programming error messages (PEMs) for novice learners. By compiling nine unreadable Python PEMs and three program complexities, the authors generate explanations and fix suggestions via Codex under varied prompts and temperatures, then evaluate the outputs with expert raters. Findings show that Codex can produce highly comprehensible explanations (avg ~88%) but the correctness of explanations and fixes is variable (correctness ~11–83% for explanations; ~33% for fixes), with temperature 0 generally producing better results. The work demonstrates the potential of LLMs as educational scaffolds for PEMs but highlights important limitations and the need for safeguards (e.g., human-in-the-loop or two-tier systems) before classroom deployment.

Abstract

A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then use the information to highlight areas in code. Researchers have been working on making these errors more novice friendly since the 1960s, however progress has been slow. The present work contributes to this stream of research by using large language models to enhance programming error messages with explanations of the errors and suggestions on how to fix the error. Large language models can be used to create useful and novice-friendly enhancements to programming error messages that sometimes surpass the original programming error messages in interpretability and actionability. These results provide further evidence of the benefits of large language models for computing educators, highlighting their use in areas known to be challenging for students. We further discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.

Using Large Language Models to Enhance Programming Error Messages

TL;DR

The paper investigates using large language models, specifically OpenAI Codex, to enhance programming error messages (PEMs) for novice learners. By compiling nine unreadable Python PEMs and three program complexities, the authors generate explanations and fix suggestions via Codex under varied prompts and temperatures, then evaluate the outputs with expert raters. Findings show that Codex can produce highly comprehensible explanations (avg ~88%) but the correctness of explanations and fixes is variable (correctness ~11–83% for explanations; ~33% for fixes), with temperature 0 generally producing better results. The work demonstrates the potential of LLMs as educational scaffolds for PEMs but highlights important limitations and the need for safeguards (e.g., human-in-the-loop or two-tier systems) before classroom deployment.

Abstract

A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then use the information to highlight areas in code. Researchers have been working on making these errors more novice friendly since the 1960s, however progress has been slow. The present work contributes to this stream of research by using large language models to enhance programming error messages with explanations of the errors and suggestions on how to fix the error. Large language models can be used to create useful and novice-friendly enhancements to programming error messages that sometimes surpass the original programming error messages in interpretability and actionability. These results provide further evidence of the benefits of large language models for computing educators, highlighting their use in areas known to be challenging for students. We further discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.
Paper Structure (15 sections, 2 tables)