Table of Contents
Fetching ...

Beyond the AI Tutor: Social Learning with LLM Agents

Harsh Kumar, Zi Kang, Mu, Jonathan Vincentius, Ashton Anderson

Abstract

Most AI-based educational tools today adopt a one-on-one tutoring paradigm, pairing a single LLM with a single learner. Yet decades of learning science research suggest that multi-party interaction -- through peer modeling, co-construction, and exposure to diverse perspectives -- can produce learning benefits that dyadic tutoring alone cannot. In this paper, we investigate whether multi-agent LLM configurations can enhance learning outcomes beyond what a single LLM tutor provides. We present two controlled experiments spanning distinct learning contexts. In a convergent problem-solving study ($N=315$), participants tackle SAT-level math problems in a 2$\times$2 design that varies the presence of an LLM tutor and LLM peers, each making different kinds of errors (conceptual vs.\ arithmetic); participants who interacted with both a tutor and peers achieved the highest unassisted test accuracy. In a divergent composition study ($N=247$), participants write argumentative and creative essays with either no AI assistance, a single LLM (Claude or ChatGPT), or both Claude and ChatGPT together; while both LLM conditions improved essay quality, only the two-agent condition avoided the idea-level homogeneity that single-model assistance was found to produce. Together, these studies offer one of the first controlled investigations of multi-agent LLM learning environments, probing whether the move from one-on-one AI tutoring toward richer agent configurations can unlock the collaborative and observational benefits long documented in human social learning research.

Beyond the AI Tutor: Social Learning with LLM Agents

Abstract

Most AI-based educational tools today adopt a one-on-one tutoring paradigm, pairing a single LLM with a single learner. Yet decades of learning science research suggest that multi-party interaction -- through peer modeling, co-construction, and exposure to diverse perspectives -- can produce learning benefits that dyadic tutoring alone cannot. In this paper, we investigate whether multi-agent LLM configurations can enhance learning outcomes beyond what a single LLM tutor provides. We present two controlled experiments spanning distinct learning contexts. In a convergent problem-solving study (), participants tackle SAT-level math problems in a 22 design that varies the presence of an LLM tutor and LLM peers, each making different kinds of errors (conceptual vs.\ arithmetic); participants who interacted with both a tutor and peers achieved the highest unassisted test accuracy. In a divergent composition study (), participants write argumentative and creative essays with either no AI assistance, a single LLM (Claude or ChatGPT), or both Claude and ChatGPT together; while both LLM conditions improved essay quality, only the two-agent condition avoided the idea-level homogeneity that single-model assistance was found to produce. Together, these studies offer one of the first controlled investigations of multi-agent LLM learning environments, probing whether the move from one-on-one AI tutoring toward richer agent configurations can unlock the collaborative and observational benefits long documented in human social learning research.

Paper Structure

This paper contains 34 sections, 9 figures.

Figures (9)

  • Figure 1: Experimental procedure and conditions for Experiment-1. Participants first go through a random topic-selection step that assigns two topics, each with two possible questions (one designated for the lesson phase and one for the test phase). In the lesson phase, participants answer one question per topic (two lesson questions total) under one of four support conditions: Control (no agents), Peers only (two peer agents answer after the participant), Tutor only (a tutor provides feedback and a follow-up question), or Tutor + Peers (peers respond after participant's answer, then the tutor provides feedback and a follow-up). The lesson sequence repeats twice (once per topic). In the test phase, participants answer the remaining question from each topic unassisted (two test questions total).
  • Figure 2: Example lesson-phase interactions from Experiment-1. Left: In the Peers Only condition, Alice (arithmetic errors) and Charlie (conceptual errors) discuss a linear equation with the participant, who initially answered incorrectly. The participant works through the disagreement on the scratchpad and arrives at the correct answer. Right: In the Tutor + Peers condition, Alice and Charlie offer conflicting solutions to an averages problem, and the tutor Bob synthesizes their attempts, identifies errors, and guides the participant step-by-step to the correct answer. In both cases, the participant submitted an incorrect answer before the lesson and solved the problem correctly afterward.
  • Figure 3: Test accuracy by lesson support condition in Experiment-1. Points show mean test accuracy (proportion correct in test round) across participants in each condition; vertical error bars indicate $\pm$1 SEM.
  • Figure 4: Post-study perceptions in Experiment-1. Panels show (left to right) perceived difficulty (1=very easy to 4=very difficult), percent who reported learning something (a little or a lot), percent who believed they answered at least one test question correctly, and percent who reported feeling confused (slightly, moderately, or very). Error bars indicate $\pm$1 SEM across participants.
  • Figure 5: Post-survey perceptions of each agent across four qualities in Experiment-1. Participants rated agents they interacted with on Competence, Warmth, Helpfulness, and Trustworthiness (1--5 Likert). In the lesson, Alice was described as having strong conceptual understanding but occasionally making arithmetic mistakes, whereas Charlie was described as computing accurately but sometimes misunderstanding underlying concepts. Panels show conditions in which the corresponding agents were present.
  • ...and 4 more figures