Table of Contents
Fetching ...

Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation

Xinyi Lu, Xu Wang

TL;DR

Problem: evaluating the quality of automatically generated MCQs without extensive real-student data. Method: Generative Students, a prompt-architecture grounded in the Knowledge-Learning-Instruction framework, simulates student profiles via mastered, confused, and unknown Knowledge Components and uses GPT-4 to generate MCQ responses in heuristic evaluation. Contributions: (i) demonstrates believable, profile-consistent generative responses, (ii) achieves a strong real-student correlation ($r=0.72$) and overlap in identified easy/hard items, and (iii) provides a practical use case where instructors revise items based on generative signals, with a classroom improvement of $0.248$ in average score and $p=0.01$ significance. Significance: offers a scalable, data-sparse approach for rapid prototyping and quality-control of MCQs across domains, while outlining risks and the need for expert input to guide KC definitions and prompt design.

Abstract

Evaluating the quality of automatically generated question items has been a long standing challenge. In this paper, we leverage LLMs to simulate student profiles and generate responses to multiple-choice questions (MCQs). The generative students' responses to MCQs can further support question item evaluation. We propose Generative Students, a prompt architecture designed based on the KLI framework. A generative student profile is a function of the list of knowledge components the student has mastered, has confusion about or has no evidence of knowledge of. We instantiate the Generative Students concept on the subject domain of heuristic evaluation. We created 45 generative students using GPT-4 and had them respond to 20 MCQs. We found that the generative students produced logical and believable responses that were aligned with their profiles. We then compared the generative students' responses to real students' responses on the same set of MCQs and found a high correlation. Moreover, there was considerable overlap in the difficult questions identified by generative students and real students. A subsequent case study demonstrated that an instructor could improve question quality based on the signals provided by Generative Students.

Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation

TL;DR

Problem: evaluating the quality of automatically generated MCQs without extensive real-student data. Method: Generative Students, a prompt-architecture grounded in the Knowledge-Learning-Instruction framework, simulates student profiles via mastered, confused, and unknown Knowledge Components and uses GPT-4 to generate MCQ responses in heuristic evaluation. Contributions: (i) demonstrates believable, profile-consistent generative responses, (ii) achieves a strong real-student correlation () and overlap in identified easy/hard items, and (iii) provides a practical use case where instructors revise items based on generative signals, with a classroom improvement of in average score and significance. Significance: offers a scalable, data-sparse approach for rapid prototyping and quality-control of MCQs across domains, while outlining risks and the need for expert input to guide KC definitions and prompt design.

Abstract

Evaluating the quality of automatically generated question items has been a long standing challenge. In this paper, we leverage LLMs to simulate student profiles and generate responses to multiple-choice questions (MCQs). The generative students' responses to MCQs can further support question item evaluation. We propose Generative Students, a prompt architecture designed based on the KLI framework. A generative student profile is a function of the list of knowledge components the student has mastered, has confusion about or has no evidence of knowledge of. We instantiate the Generative Students concept on the subject domain of heuristic evaluation. We created 45 generative students using GPT-4 and had them respond to 20 MCQs. We found that the generative students produced logical and believable responses that were aligned with their profiles. We then compared the generative students' responses to real students' responses on the same set of MCQs and found a high correlation. Moreover, there was considerable overlap in the difficult questions identified by generative students and real students. A subsequent case study demonstrated that an instructor could improve question quality based on the signals provided by Generative Students.
Paper Structure (45 sections, 2 figures, 8 tables)

This paper contains 45 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: The prompt template has three main parts: 1) an introduction of the task (c1); 2) an illustration of the generative student profile (c2); 3) a new MCQ to which the generative student will answer (c3). The generative student profile is a function of the list of heuristic rules the student has mastered, has confusion about, or has no evidence of knowledge of (a). For each mastered heuristic rule, we used an example MCQ to indicate the student has sufficient knowledge (b2); For each pair of confusion heuristic rules, we used two example MCQs to indicate the student may mistakenly choose one over the other (b1).
  • Figure 2: The focused confusion prompt (right) contains the two original questions that the student got wrong (Q1, Q2), and two additional examples to show that the students may answer the easy questions correctly (Q3, Q4). Generative students who use the focused confusion prompt are expected to have better overall performance. The focused confusion prompt aims to introduce more uncertainty to better simulate realistic scenarios.