Table of Contents
Fetching ...

Simulating Novice Students Using Machine Unlearning and Relearning in Large Language Models

Jiajia Song, Zhihan Guo, Jionghao Lin

Abstract

Student simulation can support learning-by-teaching pedagogy where human students (as tutors) teach AI-simulated novice students (as tutees). Recent research often relies on prompt engineering with large language models (LLMs) to simulate novice student behaviour, but it is difficult to keep the AI-simulated student at a stable novice knowledge level. A key reason is that many LLMs are trained to be broadly capable, so even when prompted to "act like a novice," the LLMs can still produce expert-level explanations during the learning-by-teaching interaction process. As a result, the AI-simulated student may drift beyond the intended knowledge level, reducing the credibility of the simulation for studying learning-by-teaching processes. Thus, we propose a knowledge-level simulation approach based on machine unlearning. We investigate this approach using a dataset of multiple-choice questions on Python programming concepts. We apply machine unlearning to transform a knowledgeable LLM into a novice-level AI student (i.e., teachable agent), then evaluate whether the teachable agent can relearn targeted knowledge components through learning-by-teaching dialogue interactions. Finally, we analyse the dialogue logs to characterise how the agent's behaviour changes over time, including its question asking, error patterns, and responsiveness to instruction. The results show that (1) unlearning produces simulated student agents with more novice-like responses than prompt-only baselines, (2) the agents recover a measurable portion of the unlearned knowledge under structured exposure, and (3) dialogue analyses reveal identifiable trajectories of conceptual change and teaching moves that predict learning recovery.

Simulating Novice Students Using Machine Unlearning and Relearning in Large Language Models

Abstract

Student simulation can support learning-by-teaching pedagogy where human students (as tutors) teach AI-simulated novice students (as tutees). Recent research often relies on prompt engineering with large language models (LLMs) to simulate novice student behaviour, but it is difficult to keep the AI-simulated student at a stable novice knowledge level. A key reason is that many LLMs are trained to be broadly capable, so even when prompted to "act like a novice," the LLMs can still produce expert-level explanations during the learning-by-teaching interaction process. As a result, the AI-simulated student may drift beyond the intended knowledge level, reducing the credibility of the simulation for studying learning-by-teaching processes. Thus, we propose a knowledge-level simulation approach based on machine unlearning. We investigate this approach using a dataset of multiple-choice questions on Python programming concepts. We apply machine unlearning to transform a knowledgeable LLM into a novice-level AI student (i.e., teachable agent), then evaluate whether the teachable agent can relearn targeted knowledge components through learning-by-teaching dialogue interactions. Finally, we analyse the dialogue logs to characterise how the agent's behaviour changes over time, including its question asking, error patterns, and responsiveness to instruction. The results show that (1) unlearning produces simulated student agents with more novice-like responses than prompt-only baselines, (2) the agents recover a measurable portion of the unlearned knowledge under structured exposure, and (3) dialogue analyses reveal identifiable trajectories of conceptual change and teaching moves that predict learning recovery.

Paper Structure

This paper contains 17 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Two-stage training process of the teachable agent's underlying model.
  • Figure 2: Data preprocessing and splitting strategy for machine unlearning process.
  • Figure 3: Simulating a three-round interaction process between a coach agent and a teachable agent.
  • Figure 4: Performance under unlearning ratios from 10% to 50%: (a) classification accuracy on MCQ answering and (b) F1 score. Error bars indicate the standard deviation across five random seeds.
  • Figure 5: Comparison of answer correctness scores (combining answer selection and explanation correctness) among models with different unlearning ratios (10%, 30%, and 50%) under instruction from an LLM-based coach agent on the topic of Exception Handling. These ratios are selected as representative settings for clarity of presentation; results for other unlearning ratios can be found in the https://github.com/GEMLab-HKU/Unlearn_and_Relearn/blob/main/pic/pic3.png.