Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation

Xinyi Lu; Xu Wang

Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation

Xinyi Lu, Xu Wang

TL;DR

Problem: evaluating the quality of automatically generated MCQs without extensive real-student data. Method: Generative Students, a prompt-architecture grounded in the Knowledge-Learning-Instruction framework, simulates student profiles via mastered, confused, and unknown Knowledge Components and uses GPT-4 to generate MCQ responses in heuristic evaluation. Contributions: (i) demonstrates believable, profile-consistent generative responses, (ii) achieves a strong real-student correlation ($r=0.72$) and overlap in identified easy/hard items, and (iii) provides a practical use case where instructors revise items based on generative signals, with a classroom improvement of $0.248$ in average score and $p=0.01$ significance. Significance: offers a scalable, data-sparse approach for rapid prototyping and quality-control of MCQs across domains, while outlining risks and the need for expert input to guide KC definitions and prompt design.

Abstract

Evaluating the quality of automatically generated question items has been a long standing challenge. In this paper, we leverage LLMs to simulate student profiles and generate responses to multiple-choice questions (MCQs). The generative students' responses to MCQs can further support question item evaluation. We propose Generative Students, a prompt architecture designed based on the KLI framework. A generative student profile is a function of the list of knowledge components the student has mastered, has confusion about or has no evidence of knowledge of. We instantiate the Generative Students concept on the subject domain of heuristic evaluation. We created 45 generative students using GPT-4 and had them respond to 20 MCQs. We found that the generative students produced logical and believable responses that were aligned with their profiles. We then compared the generative students' responses to real students' responses on the same set of MCQs and found a high correlation. Moreover, there was considerable overlap in the difficult questions identified by generative students and real students. A subsequent case study demonstrated that an instructor could improve question quality based on the signals provided by Generative Students.

Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation

TL;DR

) and overlap in identified easy/hard items, and (iii) provides a practical use case where instructors revise items based on generative signals, with a classroom improvement of

in average score and

significance. Significance: offers a scalable, data-sparse approach for rapid prototyping and quality-control of MCQs across domains, while outlining risks and the need for expert input to guide KC definitions and prompt design.

Abstract

Paper Structure (45 sections, 2 figures, 8 tables)

This paper contains 45 sections, 2 figures, 8 tables.

Introduction
Related Work
Automatic Question Generation for Educational Purposes
Metrics and Approaches to Evaluate Questions
Generative agents
Prompt Engineering
Generative Students Prompt Architecture
Implementation of Generative Students on Heuristic Evaluation
Final Prompt Structure and Examples
Input to the Prompt Template
Takeaways from the Prompt Engineering Process
Providing example MCQs and answers improves performance.
Asking the model to role-play as an instructor and predict the generative student's answer helps.
Using unknown rules to increase uncertainty in the predicted answers
Introducing uncertainty within the confusion prompt component by providing both positive and negative examples.
...and 30 more sections

Figures (2)

Figure 1: The prompt template has three main parts: 1) an introduction of the task (c1); 2) an illustration of the generative student profile (c2); 3) a new MCQ to which the generative student will answer (c3). The generative student profile is a function of the list of heuristic rules the student has mastered, has confusion about, or has no evidence of knowledge of (a). For each mastered heuristic rule, we used an example MCQ to indicate the student has sufficient knowledge (b2); For each pair of confusion heuristic rules, we used two example MCQs to indicate the student may mistakenly choose one over the other (b1).
Figure 2: The focused confusion prompt (right) contains the two original questions that the student got wrong (Q1, Q2), and two additional examples to show that the students may answer the easy questions correctly (Q3, Q4). Generative students who use the focused confusion prompt are expected to have better overall performance. The focused confusion prompt aims to introduce more uncertainty to better simulate realistic scenarios.

Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation

TL;DR

Abstract

Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)