"My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

Shengxin Hong; Chang Cai; Sixuan Du; Haiyue Feng; Siyuan Liu; Xiuyi Fan

"My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

Shengxin Hong, Chang Cai, Sixuan Du, Haiyue Feng, Siyuan Liu, Xiuyi Fan

TL;DR

CAELF significantly improves interactive feedback, enhancing the reasoning and interaction capabilities of LLMs and offers a promising solution to overcoming the time and resource barriers that have limited the adoption of interactive feedback in educational settings.

Abstract

Interactive feedback, where feedback flows in both directions between teacher and student, is more effective than traditional one-way feedback. However, it is often too time-consuming for widespread use in educational practice. While Large Language Models (LLMs) have potential for automating feedback, they struggle with reasoning and interaction in an interactive setting. This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback. CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation. Essays are first assessed by multiple Teaching-Assistant Agents (TA Agents), and then a Teacher Agent aggregates the evaluations through formal reasoning to generate feedback and grades. Students can further engage with the feedback to refine their understanding. A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback, enhancing the reasoning and interaction capabilities of LLMs. This approach offers a promising solution to overcoming the time and resource barriers that have limited the adoption of interactive feedback in educational settings.

"My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

TL;DR

Abstract

Paper Structure (21 sections, 3 figures, 3 tables)

This paper contains 21 sections, 3 figures, 3 tables.

Introduction
Related Work and Background
LLMs for Essay Evaluation and Feedback
Contestable AI
Computational Argumentation
Framework Design and Implementation
LLM Discussion
Formal Reasoning for Feedback Generation
Interaction with User
Experiment Settings
Essay Dataset and Assessment Rubrics
Implementation
Baselines
Evaluation Metrics
Experiment Results
...and 6 more sections

Figures (3)

Figure 1: Diagram of our contestable AI empowered LLM framework for interactive feedback generation (CAELF).
Figure 2: An example of CAELF evaluation shows the process of interactive feedback, including discussions between the TA agents, argumentative reasoning by the teacher agent, initial feedback generation, and the student's challenge to the grade.
Figure 3: Human evaluation results, including four human evaluation metrics on each feedback dimensions. For example, Readability-Issue (RE-I) represents the readability of feedback in issue dimension.

"My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

TL;DR

Abstract

"My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

Authors

TL;DR

Abstract

Table of Contents

Figures (3)