Table of Contents
Fetching ...

Experiences from Integrating Large Language Model Chatbots into the Classroom

Arto Hellas, Juho Leinonen, Leo Leppänen

TL;DR

The paper examines the introduction of an unfiltered GPT-4–based chatbot into three CS courses to understand whether such access fosters widespread reliance or selective engagement. Using post-use usefulness ratings, background surveys, and a per-chapter usage coefficient, the study finds that most students do not heavily rely on the chatbot, with usage dominated by a small set of superusers in the LLM-focused course. Differences across courses and chapters emerge, and prior LLM experience correlates negatively with usage, suggesting trial-and-error exploration among less experienced students. The findings imply that unrestrained LLM access can be manageable in classroom settings and highlight the need for tailored, scaffolded LLM experiences and potential retrieval-augmented approaches to better align with course objectives and up-to-date technology.

Abstract

In the present study, we provided students an unfiltered access to a state-of-the-art large language model (LLM) chatbot. The chatbot was intentionally designed to mimic proprietary commercial chatbots such as ChatGPT where the chatbot has not been tailored for the educational context; the underlying engine was OpenAI GPT-4. The chatbot was integrated into online learning materials of three courses. One of the courses focused on software engineering with LLMs, while the two other courses were not directly related to LLMs. Our results suggest that only a minority of students engage with the chatbot in the courses that do not relate to LLMs. At the same time, unsurprisingly, nearly all students in the LLM-focused course leveraged the chatbot. In all courses, the majority of the LLM usage came from a few superusers, whereas the majority of the students did not heavily use the chatbot even though it was readily available and effectively provided a free access to the OpenAI GPT-4 model. We also observe that in addition to students using the chatbot for course-specific purposes, many use the chatbot for their own purposes. These results suggest that the worst fears of educators -- all students overrelying on LLMs -- did not materialize even when the chatbot access was unfiltered. We finally discuss potential reasons for the low usage, suggesting the need for more tailored and scaffolded LLM experiences targeted for specific types of student use cases.

Experiences from Integrating Large Language Model Chatbots into the Classroom

TL;DR

The paper examines the introduction of an unfiltered GPT-4–based chatbot into three CS courses to understand whether such access fosters widespread reliance or selective engagement. Using post-use usefulness ratings, background surveys, and a per-chapter usage coefficient, the study finds that most students do not heavily rely on the chatbot, with usage dominated by a small set of superusers in the LLM-focused course. Differences across courses and chapters emerge, and prior LLM experience correlates negatively with usage, suggesting trial-and-error exploration among less experienced students. The findings imply that unrestrained LLM access can be manageable in classroom settings and highlight the need for tailored, scaffolded LLM experiences and potential retrieval-augmented approaches to better align with course objectives and up-to-date technology.

Abstract

In the present study, we provided students an unfiltered access to a state-of-the-art large language model (LLM) chatbot. The chatbot was intentionally designed to mimic proprietary commercial chatbots such as ChatGPT where the chatbot has not been tailored for the educational context; the underlying engine was OpenAI GPT-4. The chatbot was integrated into online learning materials of three courses. One of the courses focused on software engineering with LLMs, while the two other courses were not directly related to LLMs. Our results suggest that only a minority of students engage with the chatbot in the courses that do not relate to LLMs. At the same time, unsurprisingly, nearly all students in the LLM-focused course leveraged the chatbot. In all courses, the majority of the LLM usage came from a few superusers, whereas the majority of the students did not heavily use the chatbot even though it was readily available and effectively provided a free access to the OpenAI GPT-4 model. We also observe that in addition to students using the chatbot for course-specific purposes, many use the chatbot for their own purposes. These results suggest that the worst fears of educators -- all students overrelying on LLMs -- did not materialize even when the chatbot access was unfiltered. We finally discuss potential reasons for the low usage, suggesting the need for more tailored and scaffolded LLM experiences targeted for specific types of student use cases.
Paper Structure (22 sections, 1 figure, 5 tables)