Autoregressive Score Generation for Multi-trait Essay Scoring

Heejin Do; Yunsu Kim; Gary Geunbae Lee

Autoregressive Score Generation for Multi-trait Essay Scoring

Heejin Do, Yunsu Kim, Gary Geunbae Lee

TL;DR

This work proposes an autoregressive prediction of multi-trait scores (ArTS), incorporating a *decoding* process by leveraging the pre-trained T5, allowing a single model to predict multiple scores.

Abstract

Recently, encoder-only pre-trained models such as BERT have been successfully applied in automated essay scoring (AES) to predict a single overall score. However, studies have yet to explore these models in multi-trait AES, possibly due to the inefficiency of replicating BERT-based models for each trait. Breaking away from the existing sole use of encoder, we propose an autoregressive prediction of multi-trait scores (ArTS), incorporating a decoding process by leveraging the pre-trained T5. Unlike prior regression or classification methods, we redefine AES as a score-generation task, allowing a single model to predict multiple scores. During decoding, the subsequent trait prediction can benefit by conditioning on the preceding trait scores. Experimental results proved the efficacy of ArTS, showing over 5% average improvements in both prompts and traits.

Autoregressive Score Generation for Multi-trait Essay Scoring

TL;DR

This work proposes an autoregressive prediction of multi-trait scores (ArTS), incorporating a *decoding* process by leveraging the pre-trained T5, allowing a single model to predict multiple scores.

Abstract

Paper Structure (19 sections, 2 figures, 6 tables)

This paper contains 19 sections, 2 figures, 6 tables.

Introduction
Related Work
Autoregressive Essay Multi-trait Scoring (ArTS)
Fine-tuning T5
Score extraction
Experiment
Datasets and settings
Evaluation and validation
Results
Main results
Prompt number guidance
Trait prediction order
Feedback Prize dataset
Decoder-only LLM
Comparison with BERT-based models
...and 4 more sections

Figures (2)

Figure 1: Proposed autoregressive multi-trait essay scoring by the fine-tuning of the T5. The example is an essay written for prompt 1, which has labeled scores for six traits. Unlabeled trait scores in the prompt are set as nan.
Figure 2: Results of ArTS with Llama2-13B and comparison with the baseline and ArTS with T5 models.

Autoregressive Score Generation for Multi-trait Essay Scoring

TL;DR

Abstract

Autoregressive Score Generation for Multi-trait Essay Scoring

Authors

TL;DR

Abstract

Table of Contents

Figures (2)