Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

Ruiwei Xiao; Xinying Hou; John Stamper

Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

Ruiwei Xiao, Xinying Hou, John Stamper

TL;DR

This paper investigates how multiple levels of GPT-generated programming hints affect novice problem-solving. Using an IRB-approved think-aloud study with 12 CS1 students, the authors compare four hint levels delivered by the LLM Hint Factory and evaluate hint quality via expert rubrics and learner outcomes. They find that high-level natural-language hints alone can be ineffective or misleading for next-step and syntax issues, whereas worked-example hints substantially improve progress and learning. The study contributes a scalable, multi-level hinting system and offers design guidance for personalizing hint content and format to meet diverse help-seeking needs in AI-assisted programming education.

Abstract

Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on helping students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-solving and learning, we conducted a think-aloud study with 12 novices using the LLM Hint Factory, a system providing four levels of hints from general natural language guidance to concrete code assistance, varying in format and granularity. We discovered that high-level natural language hints alone can be helpless or even misleading, especially when addressing next-step or syntax-related help requests. Adding lower-level hints, like code examples with in-line comments, can better support students. The findings open up future work on customizing help responses from content, format, and granularity levels to accurately identify and meet students' learning needs.

Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

TL;DR

Abstract

Paper Structure (56 sections, 2 figures, 3 tables)

This paper contains 56 sections, 2 figures, 3 tables.

Introduction
Related Work
Hint delivery in traditional intelligent programming tutors
LLM-Based Intelligent Programming Tutors
LLM Hint Factory
Method
Results
Overall Hint Quality in the LLM Hint Factory
LLM Hint Factory can generate high-quality multiple levels of hints
High-quality hints are not always helpful for students’ problem-solving
The Effectiveness of Different Levels of Programming Hints for Supporting Novices
Providing hints till the level of worked example can assist students properly.
Worked example hints provide comprehensive help on next-step and syntax-related hint requests
High-level hints can provide more concise help for students to debug logic errors
High-level hints can cause more misunderstanding and frustration than low-level hints
...and 41 more sections

Figures (2)

Figure 1: LLM Hint Factory Interface. (1) Problem description; (2) Code editor; (3) Hint Section; (3.1) Orientation Hint: the 1st level hint, informs students where they should focus; (3.2) Instrumental Hint: the 2nd level hint, informs students how to do next in concise, descriptive sentences; (3.3) Worked Example Hint: the 3rd level hint, shows students an example code snippet that is similar to the code they need to write for their next step to solve the current problem; (3.4) Bottom-Out Hint: the 4th level hint, shows students the exact code they need to write for the next step to solve the current problem; (4) Click "Be More General" button to see the previous level's hint; (5) Click "Be More Specific" to see the next level's hint; (6) Click "New Hint" to generate a new set of hint.
Figure 2: Number of effective level of hint for each help-seeking type. Most of requests related to next-step (NL, NS) and syntax (NS, DS) are will be resolved when learners utilized worked example hints. Most of DL requests can be answered by high-level hints.

Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

TL;DR

Abstract

Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

Authors

TL;DR

Abstract

Table of Contents

Figures (2)