Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices
Ruiwei Xiao, Xinying Hou, John Stamper
TL;DR
This paper investigates how multiple levels of GPT-generated programming hints affect novice problem-solving. Using an IRB-approved think-aloud study with 12 CS1 students, the authors compare four hint levels delivered by the LLM Hint Factory and evaluate hint quality via expert rubrics and learner outcomes. They find that high-level natural-language hints alone can be ineffective or misleading for next-step and syntax issues, whereas worked-example hints substantially improve progress and learning. The study contributes a scalable, multi-level hinting system and offers design guidance for personalizing hint content and format to meet diverse help-seeking needs in AI-assisted programming education.
Abstract
Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on helping students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-solving and learning, we conducted a think-aloud study with 12 novices using the LLM Hint Factory, a system providing four levels of hints from general natural language guidance to concrete code assistance, varying in format and granularity. We discovered that high-level natural language hints alone can be helpless or even misleading, especially when addressing next-step or syntax-related help requests. Adding lower-level hints, like code examples with in-line comments, can better support students. The findings open up future work on customizing help responses from content, format, and granularity levels to accurately identify and meet students' learning needs.
