EduAgent: Generative Student Agents in Learning
Songlin Xu, Xinyu Zhang, Lianhui Qin
TL;DR
The paper tackles fine-grained, data-efficient simulation of student learning behaviors in online education, addressing data scarcity and the need for richer behavioral signals. It introduces EduAgent, a generative agent framework that injects cognitive priors from cognitive science to guide LLMs in reasoning about the interplay among student personas, course content, and behavioral signals. Two datasets are released: EduAgent310 real-student data with fine-grained annotations and EduAgent705 virtual data generated by EduAgent, totaling $N=310$ and $N=705$ samples respectively. Through two experiments—personalized behavior prediction and virtual generative simulation—the framework demonstrates both accurate prediction of real student behavior and realistic generation of virtual data, enabling scalable, end-to-end human-in-the-loop educational AI.
Abstract
Student simulation in online education is important to address dynamic learning behaviors of students with diverse backgrounds. Existing simulation models based on deep learning usually need massive training data, lacking prior knowledge in educational contexts. Large language models (LLMs) may contain such prior knowledge since they are pre-trained from a large corpus. However, because student behaviors are dynamic and multifaceted with individual differences, directly prompting LLMs is not robust nor accurate enough to capture fine-grained interactions among diverse student personas, learning behaviors, and learning outcomes. This work tackles this problem by presenting a newly annotated fine-grained large-scale dataset and proposing EduAgent, a novel generative agent framework incorporating cognitive prior knowledge (i.e., theoretical findings revealed in cognitive science) to guide LLMs to first reason correlations among various behaviors and then make simulations. Our two experiments show that EduAgent could not only mimic and predict learning behaviors of real students but also generate realistic learning behaviors of virtual students without real data.
