Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression
Kun Sun, Rong Wang
TL;DR
This work tackles the need for dimension-specific scoring in automated essay assessment by proposing Automatic Essay Multi-dimensional Scoring (AEMS), a fine-tuned, transformer-based framework. It combines dual heads (classification and regression) with contrastive learning and is trained on two large datasets, ELLIPSE and IELTS, to produce scores across multiple dimensions (e.g., vocabulary, grammar, coherence) alongside an overall score. The approach achieves strong performance, with per-dimension precision, F1, and Quadratic Weighted Kappa ($QWK$) values often exceeding 0.8, and demonstrates robust generalization across datasets. The work offers practical impact for multilingual writing assessment and provides model availability for further research and deployment.
Abstract
Automated essay scoring (AES) involves predicting a score that reflects the writing quality of an essay. Most existing AES systems produce only a single overall score. However, users and L2 learners expect scores across different dimensions (e.g., vocabulary, grammar, coherence) for English essays in real-world applications. To address this need, we have developed two models that automatically score English essays across multiple dimensions by employing fine-tuning and other strategies on two large datasets. The results demonstrate that our systems achieve impressive performance in evaluation using three criteria: precision, F1 score, and Quadratic Weighted Kappa. Furthermore, our system outperforms existing methods in overall scoring.
