Table of Contents
Fetching ...

Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression

Kun Sun, Rong Wang

TL;DR

This work tackles the need for dimension-specific scoring in automated essay assessment by proposing Automatic Essay Multi-dimensional Scoring (AEMS), a fine-tuned, transformer-based framework. It combines dual heads (classification and regression) with contrastive learning and is trained on two large datasets, ELLIPSE and IELTS, to produce scores across multiple dimensions (e.g., vocabulary, grammar, coherence) alongside an overall score. The approach achieves strong performance, with per-dimension precision, F1, and Quadratic Weighted Kappa ($QWK$) values often exceeding 0.8, and demonstrates robust generalization across datasets. The work offers practical impact for multilingual writing assessment and provides model availability for further research and deployment.

Abstract

Automated essay scoring (AES) involves predicting a score that reflects the writing quality of an essay. Most existing AES systems produce only a single overall score. However, users and L2 learners expect scores across different dimensions (e.g., vocabulary, grammar, coherence) for English essays in real-world applications. To address this need, we have developed two models that automatically score English essays across multiple dimensions by employing fine-tuning and other strategies on two large datasets. The results demonstrate that our systems achieve impressive performance in evaluation using three criteria: precision, F1 score, and Quadratic Weighted Kappa. Furthermore, our system outperforms existing methods in overall scoring.

Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression

TL;DR

This work tackles the need for dimension-specific scoring in automated essay assessment by proposing Automatic Essay Multi-dimensional Scoring (AEMS), a fine-tuned, transformer-based framework. It combines dual heads (classification and regression) with contrastive learning and is trained on two large datasets, ELLIPSE and IELTS, to produce scores across multiple dimensions (e.g., vocabulary, grammar, coherence) alongside an overall score. The approach achieves strong performance, with per-dimension precision, F1, and Quadratic Weighted Kappa () values often exceeding 0.8, and demonstrates robust generalization across datasets. The work offers practical impact for multilingual writing assessment and provides model availability for further research and deployment.

Abstract

Automated essay scoring (AES) involves predicting a score that reflects the writing quality of an essay. Most existing AES systems produce only a single overall score. However, users and L2 learners expect scores across different dimensions (e.g., vocabulary, grammar, coherence) for English essays in real-world applications. To address this need, we have developed two models that automatically score English essays across multiple dimensions by employing fine-tuning and other strategies on two large datasets. The results demonstrate that our systems achieve impressive performance in evaluation using three criteria: precision, F1 score, and Quadratic Weighted Kappa. Furthermore, our system outperforms existing methods in overall scoring.
Paper Structure (10 sections, 7 equations, 1 figure, 5 tables)

This paper contains 10 sections, 7 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: The roadmap of developing AEMU in the present study