Frustratingly Simple Prompting-based Text Denoising

Jungyeul Park; Mengyang Qiu

Frustratingly Simple Prompting-based Text Denoising

Jungyeul Park, Mengyang Qiu

TL;DR

The paper investigates whether simple preprocessing via text denoising can improve automated essay scoring (AES) on the ASAP dataset. It employs prompt-based corrections using two GPT-3.5 prompts to fix encoding errors and replace non-word entities, with ERRANT .m2 annotations to recover corrected words. A RoBERTa-base linear regression model is then trained and evaluated using 8-fold cross-validation across eight ASAP prompts, reporting Quadratic Weighted Kappa ($QWK$) and perplexity. The findings show modest but consistent $QWK$ gains over the original texts, supporting the notion that data quality enhancements can boost AES performance even with simple models. The work highlights the practical value of dataset preprocessing and prompts future exploration of richer features and modeling approaches to further improve AES results.

Abstract

This paper introduces a novel perspective on the automated essay scoring (AES) task, challenging the conventional view of the ASAP dataset as a static entity. Employing simple text denoising techniques using prompting, we explore the dynamic potential within the dataset. While acknowledging the previous emphasis on building regression systems, our paper underscores how making minor changes to a dataset through text denoising can enhance the final results.

Frustratingly Simple Prompting-based Text Denoising

TL;DR

) and perplexity. The findings show modest but consistent

gains over the original texts, supporting the notion that data quality enhancements can boost AES performance even with simple models. The work highlights the practical value of dataset preprocessing and prompts future exploration of richer features and modeling approaches to further improve AES results.

Abstract

Paper Structure (7 sections, 1 figure, 3 tables)

This paper contains 7 sections, 1 figure, 3 tables.

Introduction
Text denoising
Experiments and Results
Discussion and Conclusion
Appendix
Restoring the original text from gpt-corrected text
Detailed experiment results

Figures (1)

Figure 1: Example of text denoising: the Unicode symbol U+0092 is replaced with ', and @CAPS2 is substituted with an arbitrary word

Frustratingly Simple Prompting-based Text Denoising

TL;DR

Abstract

Frustratingly Simple Prompting-based Text Denoising

Authors

TL;DR

Abstract

Table of Contents

Figures (1)