Safe Generative Chats in a WhatsApp Intelligent Tutoring System
Zachary Levonian, Owen Henkel
TL;DR
This paper addresses safety challenges of deploying large language models in Intelligent Tutoring Systems by integrating a semi-structured, growth-mindset chat within a WhatsApp math tutor and evaluating it through educator-led red-teaming, an in-classroom usability test, and large-scale field deployment. The authors implement a dual-filter moderation pipeline (a word list and OpenAI moderation API) and design the conversation flow to minimize instigator and imposter risks while examining yea-sayer responses. Across over 8,000 student interactions, GPT-3.5 outputs were largely safe, with the main challenges arising from handling inappropriate or sensitive student inputs and edge-case content moderation decisions. The work provides practical guidelines for content moderation and classroom-management integration, highlighting the value of red-teaming for threshold setting and the importance of connecting high-risk cases to human support in educational settings.
Abstract
Large language models (LLMs) are flexible, personalizable, and available, which makes their use within Intelligent Tutoring Systems (ITSs) appealing. However, that flexibility creates risks: inaccuracies, harmful content, and non-curricular material. Ethically deploying LLM-backed ITS systems requires designing safeguards that ensure positive experiences for students. We describe the design of a conversational system integrated into an ITS, and our experience evaluating its safety with red-teaming, an in-classroom usability test, and field deployment. We present empirical data from more than 8,000 student conversations with this system, finding that GPT-3.5 rarely generates inappropriate messages. Comparatively more common is inappropriate messages from students, which prompts us to reason about safeguarding as a content moderation and classroom management problem. The student interaction behaviors we observe provide implications for designers - to focus on student inputs as a content moderation problem - and implications for researchers - to focus on subtle forms of bad content.
