Table of Contents
Fetching ...

LLM Chatbots in High School Programming: Exploring Behaviors and Interventions

Manuel Valle Torre, Marcus Specht, Catharine Oertel

TL;DR

This study addresses how to integrate LLMs into high school programming education using a Design-Based Research (DBR) cycle with an Intervention Group (IG) and a non-equivalent Comparison Group (CG) in a two-phase course facilitated by the JELAI environment. Unguided use of the LLM, particularly executive queries, correlated with lower midterm performance ($\rho = -0.502$, $p = 0.034$); after implementing a targeted intervention that taught instrumental help-seeking, executive queries decreased significantly ($W = 15.0$, $p = 0.036$, $r = 0.560$), yet final exam improvements were not statistically significant ($p = 0.083$, $r = 0.409$), suggesting that changing tool-use strategies alone does not overcome foundational knowledge gaps. Qualitative cases show that some students benefited from shifted help-seeking patterns (e.g., Mary: 7.30 to 8.65) while others did not experience proportional gains (e.g., Sam: 5.50 to 4.37), underscoring the complexity of learning with AI tutors. The findings highlight that the educational value of LLMs depends on pedagogy that scaffolds productive engagement, and they offer design principles for integrating AI in programming classrooms, including adaptive scaffolding and learner-visible analytics.

Abstract

This study uses a Design-Based Research (DBR) cycle to refine the integration of Large Language Models (LLMs) in high school programming education. The initial problem was identified in an Intervention Group where, in an unguided setting, a higher proportion of executive, solution-seeking queries correlated strongly and negatively with exam performance. A contemporaneous Comparison Group demonstrated that without guidance, these unproductive help-seeking patterns do not self-correct, with engagement fluctuating and eventually declining. This insight prompted a mid-course pedagogical intervention in the first group, designed to teach instrumental help-seeking. The subsequent evaluation confirmed the intervention's success, revealing a decrease in executive queries, as well as a shift toward more productive learning workflows. However, this behavioral change did not translate into a statistically significant improvement in exam grades, suggesting that altering tool-use strategies alone may be insufficient to overcome foundational knowledge gaps. The DBR process thus yields a more nuanced principle: the educational value of an LLM depends on a pedagogy that scaffolds help-seeking, but this is only one part of the complex process of learning.

LLM Chatbots in High School Programming: Exploring Behaviors and Interventions

TL;DR

This study addresses how to integrate LLMs into high school programming education using a Design-Based Research (DBR) cycle with an Intervention Group (IG) and a non-equivalent Comparison Group (CG) in a two-phase course facilitated by the JELAI environment. Unguided use of the LLM, particularly executive queries, correlated with lower midterm performance (, ); after implementing a targeted intervention that taught instrumental help-seeking, executive queries decreased significantly (, , ), yet final exam improvements were not statistically significant (, ), suggesting that changing tool-use strategies alone does not overcome foundational knowledge gaps. Qualitative cases show that some students benefited from shifted help-seeking patterns (e.g., Mary: 7.30 to 8.65) while others did not experience proportional gains (e.g., Sam: 5.50 to 4.37), underscoring the complexity of learning with AI tutors. The findings highlight that the educational value of LLMs depends on pedagogy that scaffolds productive engagement, and they offer design principles for integrating AI in programming classrooms, including adaptive scaffolding and learner-visible analytics.

Abstract

This study uses a Design-Based Research (DBR) cycle to refine the integration of Large Language Models (LLMs) in high school programming education. The initial problem was identified in an Intervention Group where, in an unguided setting, a higher proportion of executive, solution-seeking queries correlated strongly and negatively with exam performance. A contemporaneous Comparison Group demonstrated that without guidance, these unproductive help-seeking patterns do not self-correct, with engagement fluctuating and eventually declining. This insight prompted a mid-course pedagogical intervention in the first group, designed to teach instrumental help-seeking. The subsequent evaluation confirmed the intervention's success, revealing a decrease in executive queries, as well as a shift toward more productive learning workflows. However, this behavioral change did not translate into a statistically significant improvement in exam grades, suggesting that altering tool-use strategies alone may be insufficient to overcome foundational knowledge gaps. The DBR process thus yields a more nuanced principle: the educational value of an LLM depends on a pedagogy that scaffolds help-seeking, but this is only one part of the complex process of learning.

Paper Structure

This paper contains 14 sections, 4 figures.

Figures (4)

  • Figure 1: JELAI interface with a sample task in the Jupyter Notebook on the right and a task-specific LLM-based tutor on the left
  • Figure 2: Correlations between student interaction types and grades, left column is data until the midterm correlated with the midterm grades, right column is the data after the midterm, correlated with the final exam grades. (* $p<0.05$, ** $p<0.01$, *** $p<0.001$)
  • Figure 3: Weekly message counts and proportions per type for the Intervention Group
  • Figure 4: Weekly message counts and proportions per type for the Comparison Group