How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

Jionghao Lin; Zifei Han; Danielle R. Thomas; Ashish Gurung; Shivang Gupta; Vincent Aleven; Kenneth R. Koedinger

How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

Jionghao Lin, Zifei Han, Danielle R. Thomas, Ashish Gurung, Shivang Gupta, Vincent Aleven, Kenneth R. Koedinger

TL;DR

This paper tackles scaling tutor training by using GPT-4 to automatically identify incorrect trainee responses and to rephrase them into desired, corrective forms across three scenario-based lessons. The authors demonstrate that few-shot prompting yields high classification performance (approximately $F_{1}\approx 0.84$ and $AUC\approx 0.85$) and that GPT-4 can produce rephrasings whose accuracy often matches or exceeds human experts, enabling real-time explanatory feedback within a template-based system. The work provides two key contributions: a binary classifier for tutor responses and a rephrasing module that translates incorrect responses into correct ones, both evaluated against human annotations and expert rephrasings. This approach offers a scalable pathway to improve novice tutor training and holds promise for integration into synchronous tutoring platforms, with future work exploring broader lessons, advanced prompting strategies, and human-in-the-loop quality control.

Abstract

One-on-one tutoring is widely acknowledged as an effective instructional method, conditioned on qualified tutors. However, the high demand for qualified tutors remains a challenge, often necessitating the training of novice tutors (i.e., trainees) to ensure effective tutoring. Research suggests that providing timely explanatory feedback can facilitate the training process for trainees. However, it presents challenges due to the time-consuming nature of assessing trainee performance by human experts. Inspired by the recent advancements of large language models (LLMs), our study employed the GPT-4 model to build an explanatory feedback system. This system identifies trainees' responses in binary form (i.e., correct/incorrect) and automatically provides template-based feedback with responses appropriately rephrased by the GPT-4 model. We conducted our study on 410 responses from trainees across three training lessons: Giving Effective Praise, Reacting to Errors, and Determining What Students Know. Our findings indicate that: 1) using a few-shot approach, the GPT-4 model effectively identifies correct/incorrect trainees' responses from three training lessons with an average F1 score of 0.84 and an AUC score of 0.85; and 2) using the few-shot approach, the GPT-4 model adeptly rephrases incorrect trainees' responses into desired responses, achieving performance comparable to that of human experts.

How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

TL;DR

and

) and that GPT-4 can produce rephrasings whose accuracy often matches or exceeds human experts, enabling real-time explanatory feedback within a template-based system. The work provides two key contributions: a binary classifier for tutor responses and a rephrasing module that translates incorrect responses into correct ones, both evaluated against human annotations and expert rephrasings. This approach offers a scalable pathway to improve novice tutor training and holds promise for integration into synchronous tutoring platforms, with future work exploring broader lessons, advanced prompting strategies, and human-in-the-loop quality control.

Abstract

Paper Structure (18 sections, 2 equations, 6 figures, 11 tables)

This paper contains 18 sections, 2 equations, 6 figures, 11 tables.

Introduction
Related Work
Significance of Feedback on Learning
Feedback Generation
Using Large Language Models for Feedback Generation
Method
Data
Annotation for Trainee's Responses
Identifying desired trainee responses
Enhancing the trainee responses by GPT models
Evaluation approach
Results
Results for RQ1: Binary Classifier for Correct Responses
Results for RQ2: Using GPT-4 to Rephrase Incorrect Responses
Discussion
...and 3 more sections

Figures (6)

Figure 1: An example of a trainee (i.e., novice tutor) incorrectly responding to an open-ended question on how to best reply to a student by giving effective praise. In this particular example, the trainee is praising the student for getting the problem correct, which is achievement or outcomes-based praise and not based on effort.
Figure 2: Explanatory feedback for novice tutor responses.
Figure 3: Distribution of accuracy and responsiveness scores from the lesson Giving Effective Praise
Figure 4: Distribution of accuracy and responsiveness scores from the lesson Reacting to Errors
Figure 5: Distribution of accuracy and responsiveness scores from the lesson Determining What Students Know
...and 1 more figures

How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

TL;DR

Abstract

How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

Authors

TL;DR

Abstract

Table of Contents

Figures (6)