Grammatical Error Feedback: An Implicit Evaluation Approach
Stefano Bannò, Kate Knill, Mark J. F. Gales
TL;DR
The paper tackles the challenge of evaluating holistic grammatical error feedback (GEF) for second-language writing without relying on labor-intensive human annotations. It introduces an implicit evaluation framework based on grammatical lineups that pair feedback with essays across varying levels of manual correction, using LLM prompting to perform the matching. By comparing two lineup configurations—essay-type-based and feedback-based—the authors study the necessity of GEC input and the influence of lexical information on matching accuracy, using the Cambridge Learner Corpus and multiple GEC/GEF models. Key findings show that GEC improves GEF quality, that GECToR can outperform GPT-4o in some settings, and that a no-lexical-information, feedback-based evaluation reduces self-bias while retaining meaningful discrimination. The approach is cheap, flexible, and extensible to other languages and modalities, offering a practical pathway to scalable GEF systems in CALL.
Abstract
Grammatical feedback is crucial for consolidating second language (L2) learning. Most research in computer-assisted language learning has focused on feedback through grammatical error correction (GEC) systems, rather than examining more holistic feedback that may be more useful for learners. This holistic feedback will be referred to as grammatical error feedback (GEF). In this paper, we present a novel implicit evaluation approach to GEF that eliminates the need for manual feedback annotations. Our method adopts a grammatical lineup approach where the task is to pair feedback and essay representations from a set of possible alternatives. This matching process can be performed by appropriately prompting a large language model (LLM). An important aspect of this process, explored here, is the form of the lineup, i.e., the selection of foils. This paper exploits this framework to examine the quality and need for GEC to generate feedback, as well as the system used to generate feedback, using essays from the Cambridge Learner Corpus.
