BEAGLE: Behavior-Enforced Agent for Grounded Learner Emulation
Hanchen David Wang, Clayton Cohn, Zifan Xu, Siyuan Guo, Gautam Biswas, Meiyi Ma
TL;DR
BEAGLE introduces a neuro-symbolic framework to synthesize authentic student learning trajectories by integrating Self-Regulated Learning with a semi-Markov controller, Bayesian Knowledge Tracing with explicit flaw injection, and a decoupled Strategist/Executor generation pipeline. This architecture enforces novice-bound behavior and prevents silent self-correction, yielding trajectories that closely mirror real student data across behavioral, epistemic, and perceptual dimensions. Evaluations on Python programming tasks show BEAGLE reduces competency bias and achieves human-like nonlinearity, with a human Turing test indicating traces are indistinguishable from real data within a statistical equivalence bound. Ablation studies confirm the critical role of the semi-Markov dynamics and epistemic constraints in sustaining realistic learning progressions, while the framework enables scalable stress-testing of tutoring interventions.
Abstract
Simulating student learning behaviors in open-ended problem-solving environments holds potential for education research, from training adaptive tutoring systems to stress-testing pedagogical interventions. However, collecting authentic data is challenging due to privacy concerns and the high cost of longitudinal studies. While Large Language Models (LLMs) offer a promising path to student simulation, they suffer from competency bias, optimizing for efficient correctness rather than the erratic, iterative struggle characteristic of novice learners. We present BEAGLE, a neuro-symbolic framework that addresses this bias by incorporating Self-Regulated Learning (SRL) theory into a novel architecture. BEAGLE integrates three key technical innovations: (1) a semi-Markov model that governs the timing and transitions of cognitive behaviors and metacognitive behaviors; (2) Bayesian Knowledge Tracing with explicit flaw injection to enforce realistic knowledge gaps and "unknown unknowns"; and (3) a decoupled agent design that separates high-level strategy use from code generation actions to prevent the model from silently correcting its own intentional errors. In evaluations on Python programming tasks, BEAGLE significantly outperforms state-of-the-art baselines in reproducing authentic trajectories. In a human Turing test, users were unable to distinguish synthetic traces from real student data, achieving an accuracy indistinguishable from random guessing (52.8%).
