Automatic Essay Scoring in a Brazilian Scenario

Felipe Akio Matsuoka

Automatic Essay Scoring in a Brazilian Scenario

Felipe Akio Matsuoka

TL;DR

Addresses the challenge of scalable, fair AES for Brazilian ENEM Portuguese essays by training a BERT-based regression model that takes theme and essay as input. The BERT_ENEM_Regression model using BERTimbau base with a 5-output head achieves a total $QWK$ of 0.79 and a total $RMSE$ of 90.96 on a heldout set, outperforming prior baselines on the Essay-br dataset. The work also discusses limitations due to grammar sensitivity and dataset skew, suggesting future enhancements in data diversification and grammar-aware signals. Overall, the approach demonstrates that transformer-based AES can scale to large-scale exams while aligning closely with human scoring criteria.

Abstract

This paper presents a novel Automatic Essay Scoring (AES) algorithm tailored for the Portuguese-language essays of Brazil's Exame Nacional do Ensino Médio (ENEM), addressing the challenges in traditional human grading systems. Our approach leverages advanced deep learning techniques to align closely with human grading criteria, targeting efficiency and scalability in evaluating large volumes of student essays. This research not only responds to the logistical and financial constraints of manual grading in Brazilian educational assessments but also promises to enhance fairness and consistency in scoring, marking a significant step forward in the application of AES in large-scale academic settings.

Automatic Essay Scoring in a Brazilian Scenario

TL;DR

of 0.79 and a total

of 90.96 on a heldout set, outperforming prior baselines on the Essay-br dataset. The work also discusses limitations due to grammar sensitivity and dataset skew, suggesting future enhancements in data diversification and grammar-aware signals. Overall, the approach demonstrates that transformer-based AES can scale to large-scale exams while aligning closely with human scoring criteria.

Abstract

Paper Structure (10 sections, 2 equations, 3 figures, 4 tables)

This paper contains 10 sections, 2 equations, 3 figures, 4 tables.

Introduction
Materials and Methods
Dataset Overview
Exploratory Data Analysis
Preprocessing
Model Development
Model structure
Tokenization Process
Results
Discussion

Figures (3)

Figure 1: Histogram of total grades
Figure 2: The Quadratic Weighted Kappa (QWK) measures the agreement between two raters. Here, $w_{ij}$ is the weight for the disagreement between the $i$-th and $j$-th category, $o_{ij}$ is the observed agreement, and $e_{ij}$ is the expected agreement under chance.
Figure 3: The Root Mean Squared Error (RMSE) quantifies the difference between predicted values $\hat{y_i}$ and observed values $y_i$, averaged over $n$ observations.

Automatic Essay Scoring in a Brazilian Scenario

TL;DR

Abstract

Automatic Essay Scoring in a Brazilian Scenario

Authors

TL;DR

Abstract

Table of Contents

Figures (3)