LLM Chatbots in High School Programming: Exploring Behaviors and Interventions
Manuel Valle Torre, Marcus Specht, Catharine Oertel
TL;DR
This study addresses how to integrate LLMs into high school programming education using a Design-Based Research (DBR) cycle with an Intervention Group (IG) and a non-equivalent Comparison Group (CG) in a two-phase course facilitated by the JELAI environment. Unguided use of the LLM, particularly executive queries, correlated with lower midterm performance ($\rho = -0.502$, $p = 0.034$); after implementing a targeted intervention that taught instrumental help-seeking, executive queries decreased significantly ($W = 15.0$, $p = 0.036$, $r = 0.560$), yet final exam improvements were not statistically significant ($p = 0.083$, $r = 0.409$), suggesting that changing tool-use strategies alone does not overcome foundational knowledge gaps. Qualitative cases show that some students benefited from shifted help-seeking patterns (e.g., Mary: 7.30 to 8.65) while others did not experience proportional gains (e.g., Sam: 5.50 to 4.37), underscoring the complexity of learning with AI tutors. The findings highlight that the educational value of LLMs depends on pedagogy that scaffolds productive engagement, and they offer design principles for integrating AI in programming classrooms, including adaptive scaffolding and learner-visible analytics.
Abstract
This study uses a Design-Based Research (DBR) cycle to refine the integration of Large Language Models (LLMs) in high school programming education. The initial problem was identified in an Intervention Group where, in an unguided setting, a higher proportion of executive, solution-seeking queries correlated strongly and negatively with exam performance. A contemporaneous Comparison Group demonstrated that without guidance, these unproductive help-seeking patterns do not self-correct, with engagement fluctuating and eventually declining. This insight prompted a mid-course pedagogical intervention in the first group, designed to teach instrumental help-seeking. The subsequent evaluation confirmed the intervention's success, revealing a decrease in executive queries, as well as a shift toward more productive learning workflows. However, this behavioral change did not translate into a statistically significant improvement in exam grades, suggesting that altering tool-use strategies alone may be insufficient to overcome foundational knowledge gaps. The DBR process thus yields a more nuanced principle: the educational value of an LLM depends on a pedagogy that scaffolds help-seeking, but this is only one part of the complex process of learning.
