Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards

Heejin Do; Sangwon Ryu; Gary Geunbae Lee

Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards

Heejin Do, Sangwon Ryu, Gary Geunbae Lee

TL;DR

This work tackles multi-trait automated essay scoring by addressing the non-differentiability of $QWK$ through a scoring-aware reinforcement learning framework. It introduces Scoring-aware Multi-reward Reinforcement Learning (SaMRL), which uses an autoregressive score-generation model and two rewards—bi-directional $QWK$ and a mean-trait squared error penalty—updated via PPO with KL regularization against a fixed anchor. Empirical results on ASAP, ASAP++, and Feedback Prize datasets show state-of-the-art trait-wise performance across most prompts and trait sets, with ablations confirming the benefits of multi-reward optimization and dynamic weight learning. The approach demonstrates robustness across varying prompt types and data sizes, though it notes limitations related to trait-prediction order and potential per-token policy updates for future improvement.

Abstract

Recent advances in automated essay scoring (AES) have shifted towards evaluating multiple traits to provide enriched feedback. Like typical AES systems, multi-trait AES employs the quadratic weighted kappa (QWK) to measure agreement with human raters, aligning closely with the rating schema; however, its non-differentiable nature prevents its direct use in neural network training. In this paper, we propose Scoring-aware Multi-reward Reinforcement Learning (SaMRL), which integrates actual evaluation schemes into the training process by designing QWK-based rewards with a mean-squared error penalty for multi-trait AES. Existing reinforcement learning (RL) applications in AES are limited to classification models despite associated performance degradation, as RL requires probability distributions; instead, we adopt an autoregressive score generation framework to leverage token generation probabilities for robust multi-trait score predictions. Empirical analyses demonstrate that SaMRL facilitates model training, notably enhancing scoring of previously inferior prompts.

Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards

TL;DR

This work tackles multi-trait automated essay scoring by addressing the non-differentiability of

through a scoring-aware reinforcement learning framework. It introduces Scoring-aware Multi-reward Reinforcement Learning (SaMRL), which uses an autoregressive score-generation model and two rewards—bi-directional

and a mean-trait squared error penalty—updated via PPO with KL regularization against a fixed anchor. Empirical results on ASAP, ASAP++, and Feedback Prize datasets show state-of-the-art trait-wise performance across most prompts and trait sets, with ablations confirming the benefits of multi-reward optimization and dynamic weight learning. The approach demonstrates robustness across varying prompt types and data sizes, though it notes limitations related to trait-prediction order and potential per-token policy updates for future improvement.

Abstract

Paper Structure (29 sections, 9 equations, 5 figures, 6 tables)

This paper contains 29 sections, 9 equations, 5 figures, 6 tables.

Introduction
Related works
Multi-trait essay scoring
RL for text generation
Preliminary
SaMRL
Score generation model
Multi-rewards function
QWK
MSE
RL policy update
Multi-objective Optimization
Experimental setup
Datasets
Evaluations
...and 14 more sections

Figures (5)

Figure 1: Overview of distinct AES frameworks. The autoregressive framework eliminates the need for multiple trait-wise layers. Classification and autoregressive AES models probabilistically predict final scores; hence, a policy gradient reinforcement algorithm is applicable.
Figure 2: Overview of the entire process for the proposed autoregressive multi-trait AES with SaMRL. We maintain the structure of the score generation within the policy model through token-wise KL regularization and allow the model to align with human judgment by introducing multiple scoring-aware rewards.
Figure 3: Comparison of performance between different prompt types with varying trait compositions. Prompts 1,2 and 8 are evaluated on the same traits, while 3-6 prompts are assessed on the other same traits.
Figure 4: Comparison results of classification-based RL models and our SaMRL (★) for the Overall score prediction. $CLS_{+RL}$ and $CLS_{DI+RL}$ are models where RL is applied to $CLS$ and $CLS_{DI}$, respectively.
Figure 5: Variations in the updated weights of loss${R_Q}$ ($W_{QWK}$) and loss${R_M}$ ($W_{MSE}$) across training steps (left); comparison of prompt-wise averaged QWK performance between models with fixed weights and our SaMRL with trainable weights.

Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards

TL;DR

Abstract

Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards

Authors

TL;DR

Abstract

Table of Contents

Figures (5)