Table of Contents
Fetching ...

Autoregressive Score Generation for Multi-trait Essay Scoring

Heejin Do, Yunsu Kim, Gary Geunbae Lee

TL;DR

This work proposes an autoregressive prediction of multi-trait scores (ArTS), incorporating a *decoding* process by leveraging the pre-trained T5, allowing a single model to predict multiple scores.

Abstract

Recently, encoder-only pre-trained models such as BERT have been successfully applied in automated essay scoring (AES) to predict a single overall score. However, studies have yet to explore these models in multi-trait AES, possibly due to the inefficiency of replicating BERT-based models for each trait. Breaking away from the existing sole use of encoder, we propose an autoregressive prediction of multi-trait scores (ArTS), incorporating a decoding process by leveraging the pre-trained T5. Unlike prior regression or classification methods, we redefine AES as a score-generation task, allowing a single model to predict multiple scores. During decoding, the subsequent trait prediction can benefit by conditioning on the preceding trait scores. Experimental results proved the efficacy of ArTS, showing over 5% average improvements in both prompts and traits.

Autoregressive Score Generation for Multi-trait Essay Scoring

TL;DR

This work proposes an autoregressive prediction of multi-trait scores (ArTS), incorporating a *decoding* process by leveraging the pre-trained T5, allowing a single model to predict multiple scores.

Abstract

Recently, encoder-only pre-trained models such as BERT have been successfully applied in automated essay scoring (AES) to predict a single overall score. However, studies have yet to explore these models in multi-trait AES, possibly due to the inefficiency of replicating BERT-based models for each trait. Breaking away from the existing sole use of encoder, we propose an autoregressive prediction of multi-trait scores (ArTS), incorporating a decoding process by leveraging the pre-trained T5. Unlike prior regression or classification methods, we redefine AES as a score-generation task, allowing a single model to predict multiple scores. During decoding, the subsequent trait prediction can benefit by conditioning on the preceding trait scores. Experimental results proved the efficacy of ArTS, showing over 5% average improvements in both prompts and traits.
Paper Structure (19 sections, 2 figures, 6 tables)

This paper contains 19 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Proposed autoregressive multi-trait essay scoring by the fine-tuning of the T5. The example is an essay written for prompt 1, which has labeled scores for six traits. Unlabeled trait scores in the prompt are set as nan.
  • Figure 2: Results of ArTS with Llama2-13B and comparison with the baseline and ArTS with T5 models.