Table of Contents
Fetching ...

Safe Generative Chats in a WhatsApp Intelligent Tutoring System

Zachary Levonian, Owen Henkel

TL;DR

This paper addresses safety challenges of deploying large language models in Intelligent Tutoring Systems by integrating a semi-structured, growth-mindset chat within a WhatsApp math tutor and evaluating it through educator-led red-teaming, an in-classroom usability test, and large-scale field deployment. The authors implement a dual-filter moderation pipeline (a word list and OpenAI moderation API) and design the conversation flow to minimize instigator and imposter risks while examining yea-sayer responses. Across over 8,000 student interactions, GPT-3.5 outputs were largely safe, with the main challenges arising from handling inappropriate or sensitive student inputs and edge-case content moderation decisions. The work provides practical guidelines for content moderation and classroom-management integration, highlighting the value of red-teaming for threshold setting and the importance of connecting high-risk cases to human support in educational settings.

Abstract

Large language models (LLMs) are flexible, personalizable, and available, which makes their use within Intelligent Tutoring Systems (ITSs) appealing. However, that flexibility creates risks: inaccuracies, harmful content, and non-curricular material. Ethically deploying LLM-backed ITS systems requires designing safeguards that ensure positive experiences for students. We describe the design of a conversational system integrated into an ITS, and our experience evaluating its safety with red-teaming, an in-classroom usability test, and field deployment. We present empirical data from more than 8,000 student conversations with this system, finding that GPT-3.5 rarely generates inappropriate messages. Comparatively more common is inappropriate messages from students, which prompts us to reason about safeguarding as a content moderation and classroom management problem. The student interaction behaviors we observe provide implications for designers - to focus on student inputs as a content moderation problem - and implications for researchers - to focus on subtle forms of bad content.

Safe Generative Chats in a WhatsApp Intelligent Tutoring System

TL;DR

This paper addresses safety challenges of deploying large language models in Intelligent Tutoring Systems by integrating a semi-structured, growth-mindset chat within a WhatsApp math tutor and evaluating it through educator-led red-teaming, an in-classroom usability test, and large-scale field deployment. The authors implement a dual-filter moderation pipeline (a word list and OpenAI moderation API) and design the conversation flow to minimize instigator and imposter risks while examining yea-sayer responses. Across over 8,000 student interactions, GPT-3.5 outputs were largely safe, with the main challenges arising from handling inappropriate or sensitive student inputs and edge-case content moderation decisions. The work provides practical guidelines for content moderation and classroom-management integration, highlighting the value of red-teaming for threshold setting and the importance of connecting high-risk cases to human support in educational settings.

Abstract

Large language models (LLMs) are flexible, personalizable, and available, which makes their use within Intelligent Tutoring Systems (ITSs) appealing. However, that flexibility creates risks: inaccuracies, harmful content, and non-curricular material. Ethically deploying LLM-backed ITS systems requires designing safeguards that ensure positive experiences for students. We describe the design of a conversational system integrated into an ITS, and our experience evaluating its safety with red-teaming, an in-classroom usability test, and field deployment. We present empirical data from more than 8,000 student conversations with this system, finding that GPT-3.5 rarely generates inappropriate messages. Comparatively more common is inappropriate messages from students, which prompts us to reason about safeguarding as a content moderation and classroom management problem. The student interaction behaviors we observe provide implications for designers - to focus on student inputs as a content moderation problem - and implications for researchers - to focus on subtle forms of bad content.
Paper Structure (10 sections, 4 figures, 4 tables)

This paper contains 10 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Designing for safety: our process.
  • Figure 2: The generative chat moderation system.
  • Figure 3: A chat excerpt from the Rori WhatsApp interface and a simplified view of the conversation phases.
  • Figure 4: Conversation length (as number of student messages) for all conversations. Completion rate was higher during the usability test (59.5%) than the field deployment (38.9%).