Automatic Essay Scoring in a Brazilian Scenario
Felipe Akio Matsuoka
TL;DR
Addresses the challenge of scalable, fair AES for Brazilian ENEM Portuguese essays by training a BERT-based regression model that takes theme and essay as input. The BERT_ENEM_Regression model using BERTimbau base with a 5-output head achieves a total $QWK$ of 0.79 and a total $RMSE$ of 90.96 on a heldout set, outperforming prior baselines on the Essay-br dataset. The work also discusses limitations due to grammar sensitivity and dataset skew, suggesting future enhancements in data diversification and grammar-aware signals. Overall, the approach demonstrates that transformer-based AES can scale to large-scale exams while aligning closely with human scoring criteria.
Abstract
This paper presents a novel Automatic Essay Scoring (AES) algorithm tailored for the Portuguese-language essays of Brazil's Exame Nacional do Ensino Médio (ENEM), addressing the challenges in traditional human grading systems. Our approach leverages advanced deep learning techniques to align closely with human grading criteria, targeting efficiency and scalability in evaluating large volumes of student essays. This research not only responds to the logistical and financial constraints of manual grading in Brazilian educational assessments but also promises to enhance fairness and consistency in scoring, marking a significant step forward in the application of AES in large-scale academic settings.
