Table of Contents
Fetching ...

Understanding Teacher Revisions of Large Language Model-Generated Feedback

Conrad Borchers, Luiz Rodrigues, Newarney Torrezão da Costa, Cleon Xavier, Rafael Ferreira Mello

Abstract

Large language models (LLMs) increasingly generate formative feedback for students, yet little is known about how teachers revise this feedback before it reaches learners. Teachers' revisions shape what students receive, making revision practices central to evaluating AI classroom tools. We analyze a dataset of 1,349 instances of AI-generated feedback and corresponding teacher-edited explanations from 117 teachers. We examine (i) textual characteristics associated with teacher revisions, (ii) whether revision decisions can be predicted from the AI feedback text, and (iii) how revisions change the pedagogical type of feedback delivered. First, we find that teachers accept AI feedback without modification in about 80% of cases, while edited feedback tends to be significantly longer and subsequently shortened by teachers. Editing behavior varies substantially across teachers: about 50% never edit AI feedback, and only about 10% edit more than two-thirds of feedback instances. Second, machine learning models trained only on the AI feedback text as input features, using sentence embeddings, achieve fair performance in identifying which feedback will be revised (AUC=0.75). Third, qualitative coding shows that when revisions occur, teachers often simplify AI-generated feedback, shifting it away from high-information explanations toward more concise, corrective forms. Together, these findings characterize how teachers engage with AI-generated feedback in practice and highlight opportunities to design feedback systems that better align with teacher priorities while reducing unnecessary editing effort.

Understanding Teacher Revisions of Large Language Model-Generated Feedback

Abstract

Large language models (LLMs) increasingly generate formative feedback for students, yet little is known about how teachers revise this feedback before it reaches learners. Teachers' revisions shape what students receive, making revision practices central to evaluating AI classroom tools. We analyze a dataset of 1,349 instances of AI-generated feedback and corresponding teacher-edited explanations from 117 teachers. We examine (i) textual characteristics associated with teacher revisions, (ii) whether revision decisions can be predicted from the AI feedback text, and (iii) how revisions change the pedagogical type of feedback delivered. First, we find that teachers accept AI feedback without modification in about 80% of cases, while edited feedback tends to be significantly longer and subsequently shortened by teachers. Editing behavior varies substantially across teachers: about 50% never edit AI feedback, and only about 10% edit more than two-thirds of feedback instances. Second, machine learning models trained only on the AI feedback text as input features, using sentence embeddings, achieve fair performance in identifying which feedback will be revised (AUC=0.75). Third, qualitative coding shows that when revisions occur, teachers often simplify AI-generated feedback, shifting it away from high-information explanations toward more concise, corrective forms. Together, these findings characterize how teachers engage with AI-generated feedback in practice and highlight opportunities to design feedback systems that better align with teacher priorities while reducing unnecessary editing effort.

Paper Structure

This paper contains 22 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: LLM-supported feedback generation. Screenshot of the teacher's tag, evaluation, and feedback module. The red area shows the feedback generated after clicking the Generate Feedback button. Additionally, the screen displays the student's response and two tags created by a teacher.
  • Figure 2: Cumulative distribution of the percentage of messages changed by teachers. The x-axis shows the percentage of teachers (sorted by change rate), and the y-axis shows the percentage of messages modified. Dashed lines indicate teacher percentiles.