Table of Contents
Fetching ...

EduAgent: Generative Student Agents in Learning

Songlin Xu, Xinyu Zhang, Lianhui Qin

TL;DR

The paper tackles fine-grained, data-efficient simulation of student learning behaviors in online education, addressing data scarcity and the need for richer behavioral signals. It introduces EduAgent, a generative agent framework that injects cognitive priors from cognitive science to guide LLMs in reasoning about the interplay among student personas, course content, and behavioral signals. Two datasets are released: EduAgent310 real-student data with fine-grained annotations and EduAgent705 virtual data generated by EduAgent, totaling $N=310$ and $N=705$ samples respectively. Through two experiments—personalized behavior prediction and virtual generative simulation—the framework demonstrates both accurate prediction of real student behavior and realistic generation of virtual data, enabling scalable, end-to-end human-in-the-loop educational AI.

Abstract

Student simulation in online education is important to address dynamic learning behaviors of students with diverse backgrounds. Existing simulation models based on deep learning usually need massive training data, lacking prior knowledge in educational contexts. Large language models (LLMs) may contain such prior knowledge since they are pre-trained from a large corpus. However, because student behaviors are dynamic and multifaceted with individual differences, directly prompting LLMs is not robust nor accurate enough to capture fine-grained interactions among diverse student personas, learning behaviors, and learning outcomes. This work tackles this problem by presenting a newly annotated fine-grained large-scale dataset and proposing EduAgent, a novel generative agent framework incorporating cognitive prior knowledge (i.e., theoretical findings revealed in cognitive science) to guide LLMs to first reason correlations among various behaviors and then make simulations. Our two experiments show that EduAgent could not only mimic and predict learning behaviors of real students but also generate realistic learning behaviors of virtual students without real data.

EduAgent: Generative Student Agents in Learning

TL;DR

The paper tackles fine-grained, data-efficient simulation of student learning behaviors in online education, addressing data scarcity and the need for richer behavioral signals. It introduces EduAgent, a generative agent framework that injects cognitive priors from cognitive science to guide LLMs in reasoning about the interplay among student personas, course content, and behavioral signals. Two datasets are released: EduAgent310 real-student data with fine-grained annotations and EduAgent705 virtual data generated by EduAgent, totaling and samples respectively. Through two experiments—personalized behavior prediction and virtual generative simulation—the framework demonstrates both accurate prediction of real student behavior and realistic generation of virtual data, enabling scalable, end-to-end human-in-the-loop educational AI.

Abstract

Student simulation in online education is important to address dynamic learning behaviors of students with diverse backgrounds. Existing simulation models based on deep learning usually need massive training data, lacking prior knowledge in educational contexts. Large language models (LLMs) may contain such prior knowledge since they are pre-trained from a large corpus. However, because student behaviors are dynamic and multifaceted with individual differences, directly prompting LLMs is not robust nor accurate enough to capture fine-grained interactions among diverse student personas, learning behaviors, and learning outcomes. This work tackles this problem by presenting a newly annotated fine-grained large-scale dataset and proposing EduAgent, a novel generative agent framework incorporating cognitive prior knowledge (i.e., theoretical findings revealed in cognitive science) to guide LLMs to first reason correlations among various behaviors and then make simulations. Our two experiments show that EduAgent could not only mimic and predict learning behaviors of real students but also generate realistic learning behaviors of virtual students without real data.
Paper Structure (20 sections, 18 figures, 3 tables)

This paper contains 20 sections, 18 figures, 3 tables.

Figures (18)

  • Figure 1: Our EduAgent framework.
  • Figure 2:
  • Figure 3: Data distribution in EduAgent310.
  • Figure 4: Distribution of gaze stationary entropy (used to represent workload) and transition entropy (used to represent curiosity) in EduAgent310 dataset.
  • Figure 5: Distribution of each kind of persona in EduAgent705 dataset.
  • ...and 13 more figures