Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Rose E. Wang; Ana T. Ribeiro; Carly D. Robinson; Susanna Loeb; Dora Demszky

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Rose E. Wang, Ana T. Ribeiro, Carly D. Robinson, Susanna Loeb, Dora Demszky

TL;DR

The paper presents Tutor CoPilot, a human-AI system that delivers expert-like, real-time guidance to tutors by leveraging a Bridge-based reasoning model. In a preregistered randomized controlled trial with 900 tutors and 1,800 Title I students, the authors show a 4 percentage-point gain in topic mastery, with the largest benefits for lower-rated and less-experienced tutors, at an estimated $20 per tutor annually. They demonstrate that Tutor CoPilot shifts tutoring language toward high-quality, cognitively active strategies and away from simply giving away answers, supported by NLP-based analyses and qualitative tutor feedback. The work provides evidence that scalable, low-cost, real-time expert guidance can improve learning outcomes in real-world education and suggests broad potential for human-AI collaboration in other high-stakes domains, while acknowledging limitations in generalizability and modality.

Abstract

Generative AI, particularly Language Models (LMs), has the potential to transform real-world domains with societal impact, particularly where access to experts is limited. For example, in education, training novice educators with expert guidance is important for effectiveness but expensive, creating significant barriers to improving education quality at scale. This challenge disproportionately harms students from under-served communities, who stand to gain the most from high-quality education. We introduce Tutor CoPilot, a novel Human-AI approach that leverages a model of expert thinking to provide expert-like guidance to tutors as they tutor. This study is the first randomized controlled trial of a Human-AI system in live tutoring, involving 900 tutors and 1,800 K-12 students from historically under-served communities. Following a preregistered analysis plan, we find that students working with tutors that have access to Tutor CoPilot are 4 percentage points (p.p.) more likely to master topics (p<0.01). Notably, students of lower-rated tutors experienced the greatest benefit, improving mastery by 9 p.p. We find that Tutor CoPilot costs only $20 per-tutor annually. We analyze 550,000+ messages using classifiers to identify pedagogical strategies, and find that tutors with access to Tutor CoPilot are more likely to use high-quality strategies to foster student understanding (e.g., asking guiding questions) and less likely to give away the answer to the student. Tutor interviews highlight how Tutor CoPilot's guidance helps tutors to respond to student needs, though they flag issues in Tutor CoPilot, such as generating suggestions that are not grade-level appropriate. Altogether, our study of Tutor CoPilot demonstrates how Human-AI systems can scale expertise in real-world domains, bridge gaps in skills and create a future where high-quality education is accessible to all students.

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

TL;DR

Abstract

Paper Structure (65 sections, 2 equations, 6 figures, 14 tables)

This paper contains 65 sections, 2 equations, 6 figures, 14 tables.

Introduction
Related Work
Training Novices for Complex Real-World Tasks
AI and K-12 Education
Human-AI Collaborative Systems
Tutor CoPilot
Integrating into Real-Time Context (Figure 1a).
Ensuring User Safety and Privacy (Figure 1b).
Generating Expert-Like Guidance (Figure 1c).
Enabling User Customization (Figure 1d).
Study Design
Participants and Randomization
Data
Tutors.
Students.
...and 50 more sections

Figures (6)

Figure 1: Illustration of Tutor CoPilot. (a) Tutor CoPilot is integrated into live contexts as a button which the tutor can activate for real-time assistance during their tutoring sessions. (b) Tutor CoPilot applies user safety and privacy practices, such as automatically de-identifying student and tutor names and limiting the amount of user information sent to external LM services. (c) Tutor CoPilot generates expert-like guidance by leveraging the Bridge method wang-etal-2024-bridging which captures expert decision-making from their verbalized reasoning patterns. (d) Tutor CoPilot enables user customization. The tutor can customize the guidance by editing ( ), re-generating ( ), or selecting a different strategy ( ).
Figure 2: Heterogeneity analysis by tutor initial effectiveness on student learning. (a) reports by the tutor's initial quality rating and (b) by tutor's tutoring experience.
Figure 3: Strategies more likely to be used by control tutors (left) vs. treatment tutors (right). Strategies with a z-score below 1 standard deviation are shaded in gray. Control tutors tended to rely on solution-focused, passive strategies, while treatment tutors more frequently used strategies that promote deeper student engagement and comprehension.
Figure 4: Percentage of treatment sessions that used Tutor CoPilot at least once in their session. About 29% of treatment sessions used Tutor CoPilot during our study.
Figure 5: (a) reports the average number of Tutor CoPilot uses, including the sessions that had no usage, and (b) reports the average number of uses but excludes the sessions with no usage. When including the zero-use sessions, tutors use Tutor CoPilot about 3 times per session. When excluding the zero-use sessions, tutors use Tutor CoPilot about 10 times per session.
...and 1 more figures

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

TL;DR

Abstract

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Authors

TL;DR

Abstract

Table of Contents

Figures (6)