Table of Contents
Fetching ...

Hidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay Detection

Xinlin Peng, Ying Zhou, Ben He, Le Sun, Yingfei Sun

TL;DR

This work addresses the vulnerability of AI-generated-content detectors to adversarial perturbations in the education domain. It introduces AIG-ASAP, an AI-generated student-essay benchmark built from ASAP prompts, and three perturbation schemes—paraphrasing, word substitution, and sentence substitution—to test detector robustness. Empirical results show that while unperturbed AI essays are detected with high accuracy by detectors like RoBERTa-QA, the targeted perturbations can drastically reduce detection performance, with word-level substitutions bringing accuracy near the random baseline, and paraphrasing offering stronger evasion under certain prompts. The study underscores the need for more robust, domain-specific detection methods and provides a benchmark for ongoing evaluation.

Abstract

Large language models (LLMs) have exhibited remarkable capabilities in text generation tasks. However, the utilization of these models carries inherent risks, including but not limited to plagiarism, the dissemination of fake news, and issues in educational exercises. Although several detectors have been proposed to address these concerns, their effectiveness against adversarial perturbations, specifically in the context of student essay writing, remains largely unexplored. This paper aims to bridge this gap by constructing AIG-ASAP, an AI-generated student essay dataset, employing a range of text perturbation methods that are expected to generate high-quality essays while evading detection. Through empirical experiments, we assess the performance of current AIGC detectors on the AIG-ASAP dataset. The results reveal that the existing detectors can be easily circumvented using straightforward automatic adversarial attacks. Specifically, we explore word substitution and sentence substitution perturbation methods that effectively evade detection while maintaining the quality of the generated essays. This highlights the urgent need for more accurate and robust methods to detect AI-generated student essays in the education domain.

Hidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay Detection

TL;DR

This work addresses the vulnerability of AI-generated-content detectors to adversarial perturbations in the education domain. It introduces AIG-ASAP, an AI-generated student-essay benchmark built from ASAP prompts, and three perturbation schemes—paraphrasing, word substitution, and sentence substitution—to test detector robustness. Empirical results show that while unperturbed AI essays are detected with high accuracy by detectors like RoBERTa-QA, the targeted perturbations can drastically reduce detection performance, with word-level substitutions bringing accuracy near the random baseline, and paraphrasing offering stronger evasion under certain prompts. The study underscores the need for more robust, domain-specific detection methods and provides a benchmark for ongoing evaluation.

Abstract

Large language models (LLMs) have exhibited remarkable capabilities in text generation tasks. However, the utilization of these models carries inherent risks, including but not limited to plagiarism, the dissemination of fake news, and issues in educational exercises. Although several detectors have been proposed to address these concerns, their effectiveness against adversarial perturbations, specifically in the context of student essay writing, remains largely unexplored. This paper aims to bridge this gap by constructing AIG-ASAP, an AI-generated student essay dataset, employing a range of text perturbation methods that are expected to generate high-quality essays while evading detection. Through empirical experiments, we assess the performance of current AIGC detectors on the AIG-ASAP dataset. The results reveal that the existing detectors can be easily circumvented using straightforward automatic adversarial attacks. Specifically, we explore word substitution and sentence substitution perturbation methods that effectively evade detection while maintaining the quality of the generated essays. This highlights the urgent need for more accurate and robust methods to detect AI-generated student essays in the education domain.
Paper Structure (31 sections, 1 equation, 3 figures, 4 tables, 1 algorithm)

This paper contains 31 sections, 1 equation, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: The SimHash probability distribution of human-written and ChatGPT-generated essays, with a normal distribution curve fitted to the data points of the human-written data.
  • Figure 2: Detection performance curves for different essay categories using RoBERTa-QA (fine-tuned), where the X-axis represents perturbation depth.
  • Figure 3: Our proposed perturbation methods result in a slight decrease in content quality. Even after replacing 20 words, the overall essay scores only drop by 0.365.