Table of Contents
Fetching ...

When LLMs Learn to be Students: The SOEI Framework for Modeling and Evaluating Virtual Student Agents in Educational Interaction

Yiping Ma, Shiyu Hu, Xuchen Li, Yipei Wang, Yuqing Chen, Shiqing Liu, Kang Hao Cheong

TL;DR

The paper proposes the SOEI framework to model and evaluate personality-aligned Virtual Student Agents (LVSAs) in educational interactions, addressing the need for principled psychological grounding, scalable evaluation, and classroom validation. It defines a structured pipeline with Scene, Object, Evaluation, and Interaction modules, grounding LVSAs in Big Five traits and junior high Chinese instruction, and validating them through a hybrid human-GPT-4 evaluation approach. The work introduces a unified Task Definition schema, a BCUAE-based scene diagnostic, trait-specific fine-tuning via SWIFT-LoRA, and a rigorous multi-turn teacher–LVSA interaction study with real pre-service teachers. Key findings show trait-consistent behavior, measurable improvement after fine-tuning, and meaningful pedagogical adaptation, supporting the potential of LVSAs for teacher training and AI4Edu/Edu4AI research. The framework highlights both opportunities (scalability, realism, cross-trait controllability) and challenges (LC data sparsity, cross-lingual robustness, long-term classroom deployment) and outlines concrete directions for extending the approach to broader subjects and multimodal contexts.

Abstract

Recent advances in large language models (LLMs) have enabled intelligent tutoring systems, yet the development of LLM-based Virtual Student Agents (LVSAs) remains underexplored. Such agents are essential for teacher-facing applications, where simulating diverse learner traits can support adaptive instruction and pedagogical skill development. However, current methods lack principled personality modeling, scalable evaluation of behavioral consistency, and empirical validation in interactive teaching settings. We propose the SOEI framework, a structured pipeline comprising Scene, Object, Evaluation, and Interaction, for constructing and evaluating personality-aligned LVSAs in classroom scenarios. Leveraging Chinese language instruction as a cognitively and emotionally rich testbed, we generate five LVSAs based on Big Five traits through LoRA fine-tuning and expert-informed prompt design. Their behavioral realism and personality coherence are assessed using a hybrid human & GPT-4 evaluation and a multi-dimensional annotation protocol. Through controlled experiments with real pre-service teachers, we demonstrate that LVSAs can elicit adaptive teaching strategies and maintain trait-consistent behavior across multi-turn dialogues. Our results provide: (1) an educationally and psychologically grounded generation pipeline for LLM-based student agents; (2) a hybrid, scalable evaluation framework for behavioral realism; and (3) empirical insights into the pedagogical utility of LVSAs in shaping instructional adaptation. By embedding LVSAs into both generative modeling and human-in-the-loop teaching, SOEI bridges AI for Education (AI4Edu) and Education for AI (Edu4AI), positioning classroom interaction as a rigorous testbed for controllability, personality alignment, and human-likeness in large language models.

When LLMs Learn to be Students: The SOEI Framework for Modeling and Evaluating Virtual Student Agents in Educational Interaction

TL;DR

The paper proposes the SOEI framework to model and evaluate personality-aligned Virtual Student Agents (LVSAs) in educational interactions, addressing the need for principled psychological grounding, scalable evaluation, and classroom validation. It defines a structured pipeline with Scene, Object, Evaluation, and Interaction modules, grounding LVSAs in Big Five traits and junior high Chinese instruction, and validating them through a hybrid human-GPT-4 evaluation approach. The work introduces a unified Task Definition schema, a BCUAE-based scene diagnostic, trait-specific fine-tuning via SWIFT-LoRA, and a rigorous multi-turn teacher–LVSA interaction study with real pre-service teachers. Key findings show trait-consistent behavior, measurable improvement after fine-tuning, and meaningful pedagogical adaptation, supporting the potential of LVSAs for teacher training and AI4Edu/Edu4AI research. The framework highlights both opportunities (scalability, realism, cross-trait controllability) and challenges (LC data sparsity, cross-lingual robustness, long-term classroom deployment) and outlines concrete directions for extending the approach to broader subjects and multimodal contexts.

Abstract

Recent advances in large language models (LLMs) have enabled intelligent tutoring systems, yet the development of LLM-based Virtual Student Agents (LVSAs) remains underexplored. Such agents are essential for teacher-facing applications, where simulating diverse learner traits can support adaptive instruction and pedagogical skill development. However, current methods lack principled personality modeling, scalable evaluation of behavioral consistency, and empirical validation in interactive teaching settings. We propose the SOEI framework, a structured pipeline comprising Scene, Object, Evaluation, and Interaction, for constructing and evaluating personality-aligned LVSAs in classroom scenarios. Leveraging Chinese language instruction as a cognitively and emotionally rich testbed, we generate five LVSAs based on Big Five traits through LoRA fine-tuning and expert-informed prompt design. Their behavioral realism and personality coherence are assessed using a hybrid human & GPT-4 evaluation and a multi-dimensional annotation protocol. Through controlled experiments with real pre-service teachers, we demonstrate that LVSAs can elicit adaptive teaching strategies and maintain trait-consistent behavior across multi-turn dialogues. Our results provide: (1) an educationally and psychologically grounded generation pipeline for LLM-based student agents; (2) a hybrid, scalable evaluation framework for behavioral realism; and (3) empirical insights into the pedagogical utility of LVSAs in shaping instructional adaptation. By embedding LVSAs into both generative modeling and human-in-the-loop teaching, SOEI bridges AI for Education (AI4Edu) and Education for AI (Edu4AI), positioning classroom interaction as a rigorous testbed for controllability, personality alignment, and human-likeness in large language models.

Paper Structure

This paper contains 77 sections, 43 figures, 17 tables.

Figures (43)

  • Figure 1: Overview of the SOEI pipeline, which supports the structured modeling and evaluation of virtual student agents. The framework consists of four modules—Scene, Object, Evaluation, and Interaction—each contributing to data preparation, agent construction, behavior assessment, and dialogue-based validation. HE, HN, LO, HA, and LC correspond to High Extraversion, High Neuroticism, Low Openness, High Agreeableness, and Low Conscientiousness, respectively.
  • Figure 2: Mapping conceptual theory to operational modeling for LVSA: Real student behaviors (left) are grounded in physiological, cognitive, social-emotional, and moral-spiritual development. These are transformed into operational dimensions (right)—personality traits, question-answer types, generation sources, linguistic styles, and learning stages—to construct controllable LVSAs.
  • Figure 3: Statistical information of $D_O$.
  • Figure 4: Word cloud visualization of the Big Five personality fine-tuning dataset.
  • Figure 5: Human evaluation results using radar chart visualization. Fleiss's Kappa = 0.6917 indicates substantial inter-rater agreement across personality types (see App. \ref{['subsubsec:human evaluation agreement']}).
  • ...and 38 more figures