Table of Contents
Fetching ...

Automatic Essay Scoring in a Brazilian Scenario

Felipe Akio Matsuoka

TL;DR

Addresses the challenge of scalable, fair AES for Brazilian ENEM Portuguese essays by training a BERT-based regression model that takes theme and essay as input. The BERT_ENEM_Regression model using BERTimbau base with a 5-output head achieves a total $QWK$ of 0.79 and a total $RMSE$ of 90.96 on a heldout set, outperforming prior baselines on the Essay-br dataset. The work also discusses limitations due to grammar sensitivity and dataset skew, suggesting future enhancements in data diversification and grammar-aware signals. Overall, the approach demonstrates that transformer-based AES can scale to large-scale exams while aligning closely with human scoring criteria.

Abstract

This paper presents a novel Automatic Essay Scoring (AES) algorithm tailored for the Portuguese-language essays of Brazil's Exame Nacional do Ensino Médio (ENEM), addressing the challenges in traditional human grading systems. Our approach leverages advanced deep learning techniques to align closely with human grading criteria, targeting efficiency and scalability in evaluating large volumes of student essays. This research not only responds to the logistical and financial constraints of manual grading in Brazilian educational assessments but also promises to enhance fairness and consistency in scoring, marking a significant step forward in the application of AES in large-scale academic settings.

Automatic Essay Scoring in a Brazilian Scenario

TL;DR

Addresses the challenge of scalable, fair AES for Brazilian ENEM Portuguese essays by training a BERT-based regression model that takes theme and essay as input. The BERT_ENEM_Regression model using BERTimbau base with a 5-output head achieves a total of 0.79 and a total of 90.96 on a heldout set, outperforming prior baselines on the Essay-br dataset. The work also discusses limitations due to grammar sensitivity and dataset skew, suggesting future enhancements in data diversification and grammar-aware signals. Overall, the approach demonstrates that transformer-based AES can scale to large-scale exams while aligning closely with human scoring criteria.

Abstract

This paper presents a novel Automatic Essay Scoring (AES) algorithm tailored for the Portuguese-language essays of Brazil's Exame Nacional do Ensino Médio (ENEM), addressing the challenges in traditional human grading systems. Our approach leverages advanced deep learning techniques to align closely with human grading criteria, targeting efficiency and scalability in evaluating large volumes of student essays. This research not only responds to the logistical and financial constraints of manual grading in Brazilian educational assessments but also promises to enhance fairness and consistency in scoring, marking a significant step forward in the application of AES in large-scale academic settings.
Paper Structure (10 sections, 2 equations, 3 figures, 4 tables)

This paper contains 10 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Histogram of total grades
  • Figure 2: The Quadratic Weighted Kappa (QWK) measures the agreement between two raters. Here, $w_{ij}$ is the weight for the disagreement between the $i$-th and $j$-th category, $o_{ij}$ is the observed agreement, and $e_{ij}$ is the expected agreement under chance.
  • Figure 3: The Root Mean Squared Error (RMSE) quantifies the difference between predicted values $\hat{y_i}$ and observed values $y_i$, averaged over $n$ observations.