Rank-Then-Score: Enhancing Large Language Models for Automated Essay Scoring

Yida Cai; Kun Liang; Sanwoo Lee; Qinghan Wang; Yunfang Wu

Rank-Then-Score: Enhancing Large Language Models for Automated Essay Scoring

Yida Cai, Kun Liang, Sanwoo Lee, Qinghan Wang, Yunfang Wu

TL;DR

This paper proposes Rank-Then-Score (RTS), a fine-tuning framework based on large language models to enhance their essay scoring capabilities, which consistently outperforms the direct prompting method in terms of average QWK across all LLMs and datasets, and achieves the best performance on Chinese essay scoring using the HSK dataset.

Abstract

In recent years, large language models (LLMs) achieve remarkable success across a variety of tasks. However, their potential in the domain of Automated Essay Scoring (AES) remains largely underexplored. Moreover, compared to English data, the methods for Chinese AES is not well developed. In this paper, we propose Rank-Then-Score (RTS), a fine-tuning framework based on large language models to enhance their essay scoring capabilities. Specifically, we fine-tune the ranking model (Ranker) with feature-enriched data, and then feed the output of the ranking model, in the form of a candidate score set, with the essay content into the scoring model (Scorer) to produce the final score. Experimental results on two benchmark datasets, HSK and ASAP, demonstrate that RTS consistently outperforms the direct prompting (Vanilla) method in terms of average QWK across all LLMs and datasets, and achieves the best performance on Chinese essay scoring using the HSK dataset.

Rank-Then-Score: Enhancing Large Language Models for Automated Essay Scoring

TL;DR

Abstract

Rank-Then-Score: Enhancing Large Language Models for Automated Essay Scoring

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)