Table of Contents
Fetching ...

Towards Prompt Generalization: Grammar-aware Cross-Prompt Automated Essay Scoring

Heejin Do, Taehee Park, Sangwon Ryu, Gary Geunbae Lee

TL;DR

This work tackles cross-prompt automated essay scoring by learning prompt-general representations through grammar-aware learning. It introduces GAPS, a two-stage framework that first applies grammar error correction to generate corrected essays and then uses dual encoders with cross-attention knowledge sharing to derive trait scores from both original and corrected texts. The approach yields improvements in prompt-independent traits and strong cross-prompt performance, particularly in challenging unseen prompts, demonstrating the value of syntactic cues for unseen-prompt evaluation. The findings suggest that leveraging grammar-corrected inputs can enhance the robustness and generalizability of AES systems in real-world educational settings.

Abstract

In automated essay scoring (AES), recent efforts have shifted toward cross-prompt settings that score essays on unseen prompts for practical applicability. However, prior methods trained with essay-score pairs of specific prompts pose challenges in obtaining prompt-generalized essay representation. In this work, we propose a grammar-aware cross-prompt trait scoring (GAPS), which internally captures prompt-independent syntactic aspects to learn generic essay representation. We acquire grammatical error-corrected information in essays via the grammar error correction technique and design the AES model to seamlessly integrate such information. By internally referring to both the corrected and the original essays, the model can focus on generic features during training. Empirical experiments validate our method's generalizability, showing remarkable improvements in prompt-independent and grammar-related traits. Furthermore, GAPS achieves notable QWK gains in the most challenging cross-prompt scenario, highlighting its strength in evaluating unseen prompts.

Towards Prompt Generalization: Grammar-aware Cross-Prompt Automated Essay Scoring

TL;DR

This work tackles cross-prompt automated essay scoring by learning prompt-general representations through grammar-aware learning. It introduces GAPS, a two-stage framework that first applies grammar error correction to generate corrected essays and then uses dual encoders with cross-attention knowledge sharing to derive trait scores from both original and corrected texts. The approach yields improvements in prompt-independent traits and strong cross-prompt performance, particularly in challenging unseen prompts, demonstrating the value of syntactic cues for unseen-prompt evaluation. The findings suggest that leveraging grammar-corrected inputs can enhance the robustness and generalizability of AES systems in real-world educational settings.

Abstract

In automated essay scoring (AES), recent efforts have shifted toward cross-prompt settings that score essays on unseen prompts for practical applicability. However, prior methods trained with essay-score pairs of specific prompts pose challenges in obtaining prompt-generalized essay representation. In this work, we propose a grammar-aware cross-prompt trait scoring (GAPS), which internally captures prompt-independent syntactic aspects to learn generic essay representation. We acquire grammatical error-corrected information in essays via the grammar error correction technique and design the AES model to seamlessly integrate such information. By internally referring to both the corrected and the original essays, the model can focus on generic features during training. Empirical experiments validate our method's generalizability, showing remarkable improvements in prompt-independent and grammar-related traits. Furthermore, GAPS achieves notable QWK gains in the most challenging cross-prompt scenario, highlighting its strength in evaluating unseen prompts.

Paper Structure

This paper contains 18 sections, 2 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The overview of the proposed GAPS method.
  • Figure 2: QWK scores for traits evaluated in Prompt 7.