Table of Contents
Fetching ...

Classroom Simulacra: Building Contextual Student Generative Agents in Online Education for Learning Behavioral Simulation

Songlin Xu, Hao-Ning Wen, Hongyi Pan, Dallas Dominguez, Dongyin Hu, Xinyu Zhang

TL;DR

Classroom Simulacra presents a contextual student-simulation framework that leverages large language models (LLMs) augmented with a Transferable Iterative Reflection (TIR) module to model how course materials modulate learning behaviors. The authors address the lack of granular course-material data and the token-length constraints of LLMs by collecting a 6-week, 60-student online workshop dataset and introducing TIR, which enables both prompting-based and finetuning-based LLMs to achieve state-of-the-art or surpasses deep-learning baselines in predicting future post-lecture performance. Across EduAgent public data and their newly collected dataset, TIR-enhanced LLMs capture fine-grained dynamics, inter-student correlations, and lecture-level variability more faithfully than traditional approaches, signaling potential for a digital twin of online classrooms. The work offers practical implications for students, instructors, and parents while outlining limitations and future directions for broader generalization and richer behavioral modeling.

Abstract

Student simulation supports educators to improve teaching by interacting with virtual students. However, most existing approaches ignore the modulation effects of course materials because of two challenges: the lack of datasets with granularly annotated course materials, and the limitation of existing simulation models in processing extremely long textual data. To solve the challenges, we first run a 6-week education workshop from N = 60 students to collect fine-grained data using a custom built online education system, which logs students' learning behaviors as they interact with lecture materials over time. Second, we propose a transferable iterative reflection (TIR) module that augments both prompting-based and finetuning-based large language models (LLMs) for simulating learning behaviors. Our comprehensive experiments show that TIR enables the LLMs to perform more accurate student simulation than classical deep learning models, even with limited demonstration data. Our TIR approach better captures the granular dynamism of learning performance and inter-student correlations in classrooms, paving the way towards a ''digital twin'' for online education.

Classroom Simulacra: Building Contextual Student Generative Agents in Online Education for Learning Behavioral Simulation

TL;DR

Classroom Simulacra presents a contextual student-simulation framework that leverages large language models (LLMs) augmented with a Transferable Iterative Reflection (TIR) module to model how course materials modulate learning behaviors. The authors address the lack of granular course-material data and the token-length constraints of LLMs by collecting a 6-week, 60-student online workshop dataset and introducing TIR, which enables both prompting-based and finetuning-based LLMs to achieve state-of-the-art or surpasses deep-learning baselines in predicting future post-lecture performance. Across EduAgent public data and their newly collected dataset, TIR-enhanced LLMs capture fine-grained dynamics, inter-student correlations, and lecture-level variability more faithfully than traditional approaches, signaling potential for a digital twin of online classrooms. The work offers practical implications for students, instructors, and parents while outlining limitations and future directions for broader generalization and richer behavioral modeling.

Abstract

Student simulation supports educators to improve teaching by interacting with virtual students. However, most existing approaches ignore the modulation effects of course materials because of two challenges: the lack of datasets with granularly annotated course materials, and the limitation of existing simulation models in processing extremely long textual data. To solve the challenges, we first run a 6-week education workshop from N = 60 students to collect fine-grained data using a custom built online education system, which logs students' learning behaviors as they interact with lecture materials over time. Second, we propose a transferable iterative reflection (TIR) module that augments both prompting-based and finetuning-based large language models (LLMs) for simulating learning behaviors. Our comprehensive experiments show that TIR enables the LLMs to perform more accurate student simulation than classical deep learning models, even with limited demonstration data. Our TIR approach better captures the granular dynamism of learning performance and inter-student correlations in classrooms, paving the way towards a ''digital twin'' for online education.

Paper Structure

This paper contains 51 sections, 19 figures, 2 tables.

Figures (19)

  • Figure 1: Training (left) and testing (right) schemes for prompting-based models.
  • Figure 2: Training (left) and testing (right) scheme for finetuning-based models.
  • Figure 3: Prompt examples in the Transferable Iterative Reflection process.
  • Figure 4: (a). Illustration of our CogEdu system. (b). Our action prompt strategy for instructors based on attention ratio and knowledge ratio. (c,d). The UI of the user end (c) and server end (d) of CogEdu.
  • Figure 5: A real online education example of our live CogEdu system shown in Fig. \ref{['cogedu system']}
  • ...and 14 more figures