Exploring LLM-Generated Feedback for Economics Essays: How Teaching Assistants Evaluate and Envision Its Use
Xinyi Lu, Aditya Mahesh, Zejia Shen, Mitchell Dudley, Larissa Sano, Xu Wang
TL;DR
The paper investigates whether LLM-generated feedback can serve as actionable suggestions to speed and improve human instructors' feedback on knowledge-intensive economics essays. It introduces a modular feedback pipeline that identifies relevant essay sentences per rubric, assesses rubric satisfaction, and generates rubric-aligned feedback, visualized in a Word plugin with highlights. Through think-aloud studies with five TAs across four assignments, the study finds AI feedback often mirrors effective feedback features and can be more rubric-aligned and personalized than historic feedback, though it risks rigidity and fragmentation; TAs view AI suggestions as a way to accelerate grading and improve consistency when used collaboratively with human judgment. Design implications emphasize detailed rubrics, transparent highlighting of AI reasoning, and intermediate AI outputs to support trustworthy human-AI collaboration, with broader potential to scale feedback quality in knowledge-intensive courses. Limitations include hallucinations and edge cases not captured by rubrics; future work should assess student learning outcomes and extend the approach to other disciplines and course contexts.
Abstract
This project examines the prospect of using AI-generated feedback as suggestions to expedite and enhance human instructors' feedback provision. In particular, we focus on understanding the teaching assistants' perspectives on the quality of AI-generated feedback and how they may or may not utilize AI feedback in their own workflows. We situate our work in a foundational college Economics class, which has frequent short essay assignments. We developed an LLM-powered feedback engine that generates feedback on students' essays based on grading rubrics used by the teaching assistants (TAs). To ensure that TAs can meaningfully critique and engage with the AI feedback, we had them complete their regular grading jobs. For a randomly selected set of essays that they had graded, we used our feedback engine to generate feedback and displayed the feedback as in-text comments in a Word document. We then performed think-aloud studies with 5 TAs over 20 1-hour sessions to have them evaluate the AI feedback, contrast the AI feedback with their handwritten feedback, and share how they envision using the AI feedback if they were offered as suggestions. The study highlights the importance of providing detailed rubrics for AI to generate high-quality feedback for knowledge-intensive essays. TAs considered that using AI feedback as suggestions during their grading could expedite grading, enhance consistency, and improve overall feedback quality. We discuss the importance of decomposing the feedback generation task into steps and presenting intermediate results, in order for TAs to use the AI feedback.
