Table of Contents
Fetching ...

Understanding Help-Seeking Behavior of Students Using LLMs vs. Web Search for Writing SQL Queries

Harsh Kumar, Mohi Reza, Jeb Mitchell, Ilya Musabirov, Lisa Zhang, Michael Liut

TL;DR

The study addresses how students seek help for SQL using web search versus LLMs in CS education. It employs a randomized interview design with 39 students comparing web search, ChatGPT, and an instructor-tuned LLM across two SQL tasks, measuring interactions, edits, query quality, and mental demand. Key findings show the instructor-tuned LLM drives higher engagement (>$2 imes$ interactions) without significantly altering final SQL quality or edits, and with a trend toward lower mental demand. The work demonstrates the potential of cost-effective, instructor-guided LLMs to scaffold learning and inform the design of AI tutors in programming education.

Abstract

Growth in the use of large language models (LLMs) in programming education is altering how students write SQL queries. Traditionally, students relied heavily on web search for coding assistance, but this has shifted with the adoption of LLMs like ChatGPT. However, the comparative process and outcomes of using web search versus LLMs for coding help remain underexplored. To address this, we conducted a randomized interview study in a database classroom to compare web search and LLMs, including a publicly available LLM (ChatGPT) and an instructor-tuned LLM, for writing SQL queries. Our findings indicate that using an instructor-tuned LLM required significantly more interactions than both ChatGPT and web search, but resulted in a similar number of edits to the final SQL query. No significant differences were found in the quality of the final SQL queries between conditions, although the LLM conditions directionally showed higher query quality. Furthermore, students using instructor-tuned LLM reported a lower mental demand. These results have implications for learning and productivity in programming education.

Understanding Help-Seeking Behavior of Students Using LLMs vs. Web Search for Writing SQL Queries

TL;DR

The study addresses how students seek help for SQL using web search versus LLMs in CS education. It employs a randomized interview design with 39 students comparing web search, ChatGPT, and an instructor-tuned LLM across two SQL tasks, measuring interactions, edits, query quality, and mental demand. Key findings show the instructor-tuned LLM drives higher engagement (> interactions) without significantly altering final SQL quality or edits, and with a trend toward lower mental demand. The work demonstrates the potential of cost-effective, instructor-guided LLMs to scaffold learning and inform the design of AI tutors in programming education.

Abstract

Growth in the use of large language models (LLMs) in programming education is altering how students write SQL queries. Traditionally, students relied heavily on web search for coding assistance, but this has shifted with the adoption of LLMs like ChatGPT. However, the comparative process and outcomes of using web search versus LLMs for coding help remain underexplored. To address this, we conducted a randomized interview study in a database classroom to compare web search and LLMs, including a publicly available LLM (ChatGPT) and an instructor-tuned LLM, for writing SQL queries. Our findings indicate that using an instructor-tuned LLM required significantly more interactions than both ChatGPT and web search, but resulted in a similar number of edits to the final SQL query. No significant differences were found in the quality of the final SQL queries between conditions, although the LLM conditions directionally showed higher query quality. Furthermore, students using instructor-tuned LLM reported a lower mental demand. These results have implications for learning and productivity in programming education.
Paper Structure (23 sections, 2 figures)

This paper contains 23 sections, 2 figures.

Figures (2)

  • Figure 1: Comparative Analysis of Number of Interactions and Number of Changes to SQL Query between Conditions. The left panel shows the average number of interactions, indicating a higher number of interactions for the Instructor-tuned LLM condition. The right panel evaluates the average number of changes made to the final SQL query, with no significant differences between conditions. Error bars represent +- one standard error of the mean.
  • Figure 2: Comparative Analysis of Students' Self-Reported Mental Demand during Task and Correctness of SQL Queries between Conditions. The left panel shows the average reported mental demand, with no significant differences between conditions but directionally lesser average mental demand for students using Instructor-tuned LLM. The right panel shows the average correctness of SQL queries, highlighting higher correctness when students used either of the LLMs compared to the web search condition. Error bars represent +- one standard error of the mean.