Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System

Robin Schmucker; Meng Xia; Amos Azaria; Tom Mitchell

Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System

Robin Schmucker, Meng Xia, Amos Azaria, Tom Mitchell

TL;DR

This paper tackles the high cost of authoring content for conversational tutoring systems (CTSs) by introducing Ruffle&Riley, a large language model (LLM)-driven CTS that auto-generates tutoring scripts from lesson text and orchestrates free-form dialogue via two agents in a learning-by-teaching setup. Grounded in the EMT design framework, the system enables AI-assisted content authoring and automated script orchestration, with human learners engaging by teaching the student agent under the professor's guidance. Two online studies (N=200) in biology compare Ruffle&Riley to QA chatbots and reading, revealing higher engagement and perceived understanding with R&R but no significant learning gains over reading in short-term assessments; interaction analysis identifies usage patterns linked to outcomes and suggests directions to improve feedback and reduce gaming. The work provides practical insights for designing scalable, LLM-based CTSs and openly releases the system to foster ongoing research into instructional design for learning technologies, highlighting both potential and challenges in real-world educational settings.

Abstract

Conversational tutoring systems (CTSs) offer learning experiences through interactions based on natural language. They are recognized for promoting cognitive engagement and improving learning outcomes, especially in reasoning tasks. Nonetheless, the cost associated with authoring CTS content is a major obstacle to widespread adoption and to research on effective instructional design. In this paper, we discuss and evaluate a novel type of CTS that leverages recent advances in large language models (LLMs) in two ways: First, the system enables AI-assisted content authoring by inducing an easily editable tutoring script automatically from a lesson text. Second, the system automates the script orchestration in a learning-by-teaching format via two LLM-based agents (Ruffle&Riley) acting as a student and a professor. The system allows for free-form conversations that follow the ITS-typical inner and outer loop structure. We evaluate Ruffle&Riley's ability to support biology lessons in two between-subject online user studies (N = 200) comparing the system to simpler QA chatbots and reading activity. Analyzing system usage patterns, pre/post-test scores and user experience surveys, we find that Ruffle&Riley users report high levels of engagement, understanding and perceive the offered support as helpful. Even though Ruffle&Riley users require more time to complete the activity, we did not find significant differences in short-term learning gains over the reading activity. Our system architecture and user study provide various insights for designers of future CTSs. We further open-source our system to support ongoing research on effective instructional design of LLM-based learning technologies.

Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System

TL;DR

Abstract

Paper Structure (30 sections, 4 figures, 6 tables)

This paper contains 30 sections, 4 figures, 6 tables.

Introduction
Related Work
Conversational Tutoring Systems
Content Authoring Tools
System Design and Architecture
Design Considerations
User Interface
AI-Assisted Tutoring Script Authoring
Tutoring Script Orchestration
Experimental Design
Learning Material
Conditions
Surveys/Questionnaires
Recruitment
Evaluation 1: Initial System Validation
...and 15 more sections

Figures (4)

Figure 1: UI of Ruffle&Riley. (a) Learners are asked to teach Ruffle (student agent) in a free-form conversation and request help as needed from Riley (professor agent). Ruffle tries to guide the learner to articulate the expectations in the tutoring script. (b) The learner can navigate the lesson material during the conversation. (c) Ruffle encourages the learner to explain the content. (d) Riley responds to a help request. (e) Riley detected a misconception and prompts the learner to revise their response.
Figure 2: System architecture. Ruffle&Riley generates a tutoring script automatically from a lesson text by executing three separate prompts that induce questions, solutions and expectations for the EMT-based dialog. During the learning process, the script is orchestrated via two LLM-based conversational agents in a free-form dialog that follows the ITS-typical outer and inner loop structure.
Figure 3: Tutoring script. To structure the conversational activity, Ruffle&Riley relies on a pre-generated script featuring a list of questions and related expectations for the EMT-based dialog. Tutoring scripts can be generated automatically from existing lessons text and offer instructional designers a convenient interface for system configuration.
Figure 4: Temporal Interaction Patterns. By visualizing the usage of text navigation, chat response, and help request features over time, we observe four distinct usage patterns.

Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System

TL;DR

Abstract

Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System

Authors

TL;DR

Abstract

Table of Contents

Figures (4)