Large Language Models for In-Context Student Modeling: Synthesizing Student's Behavior in Visual Programming
Manh Hung Nguyen, Sebastian Tschiatschek, Adish Singla
TL;DR
The paper addresses modeling student behavior in open-ended domains by enabling synthesis of a student’s target-task attempt from observed reference-task behavior. It introduces LLM-SS, a perturbation-based, in-context learning framework that leverages domain-specific fine-tuning to inject expert knowledge and infer student misconceptions, producing synthesized attempts $\widehat{C}^{\textsc{stu}}_{T^{\textnormal{tar}}}$. Formal problem setup includes a two-step process over task spaces $\mathbb{T}$ and $\mathbb{C}$ with a quality rubric $Q_{\text{stu}}$, $Q_{\text{task}}$, and $Q_{\text{overall}} = Q_{\text{stu}} \times Q_{\text{task}}$, demonstrated on the HoCMaze/StudentSyn benchmark. Experimental results show that fine-tuned LLMs substantially improve synthesis quality over NeurSS and, in some cases, approach human tutor performance, highlighting the framework’s potential to scale in-context student modeling without heavy training pipelines.
Abstract
Student modeling is central to many educational technologies as it enables predicting future learning outcomes and designing targeted instructional strategies. However, open-ended learning domains pose challenges for accurately modeling students due to the diverse behaviors and a large space of possible misconceptions. To approach these challenges, we explore the application of large language models (LLMs) for in-context student modeling in open-ended learning domains. More concretely, given a particular student's attempt on a reference task as observation, the objective is to synthesize the student's attempt on a target task. We introduce a novel framework, LLM for Student Synthesis (LLM-SS), that leverages LLMs for synthesizing a student's behavior. Our framework can be combined with different LLMs; moreover, we fine-tune LLMs to boost their student modeling capabilities. We instantiate several methods based on LLM-SS framework and evaluate them using an existing benchmark, StudentSyn, for student attempt synthesis in a visual programming domain. Experimental results show that our methods perform significantly better than the baseline method NeurSS provided in the StudentSyn benchmark. Furthermore, our method using a fine-tuned version of the GPT-3.5 model is significantly better than using the base GPT-3.5 model and gets close to human tutors' performance.
