Cracking the Code: Evaluating Zero-Shot Prompting Methods for Providing Programming Feedback
Niklas Ippisch, Anna-Carolina Haensch, Jan Simson, Jacob Beck, Markus Herklotz, Malte Schierholz
TL;DR
This work addresses how to elicit high-quality feedback from large language models for beginner programming errors. It introduces a structured evaluation framework, grounded in Ryan et al. (2020), and compares four zero-shot prompting strategies—Chain of Thought, Prompt Chaining, Tree of Thought, and ReAct—plus a vanilla baseline, in the context of beginner R programming errors. Key findings indicate that enforcing a stepwise process enhances feedback precision, while omitting explicit data references can improve error identification, highlighting a trade-off between localization and actionable remediation. The framework is designed to be transferable to other programming languages and coding tasks, supporting educators and researchers in systematically assessing LLM-derived feedback quality.
Abstract
Despite the growing use of large language models (LLMs) for providing feedback, limited research has explored how to achieve high-quality feedback. This case study introduces an evaluation framework to assess different zero-shot prompt engineering methods. We varied the prompts systematically and analyzed the provided feedback on programming errors in R. The results suggest that prompts suggesting a stepwise procedure increase the precision, while omitting explicit specifications about which provided data to analyze improves error identification.
