Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing
Shoumik Saha, Soheil Feizi
TL;DR
This work investigates how AI-assisted polishing of human-written text confounds AI-detection systems. By introducing the 14.7K-sample APT-Eval benchmark with degree-based and percentage-based AI involvement across six domains, the authors systematically evaluate 12 detectors and optimize per-detector thresholds. The results reveal high false positive rates for minimally polished text, limited ability to distinguish degrees of AI involvement, and biases against older or smaller polishers, with domain-specific vulnerabilities. The study advocates probabilistic or tiered labeling, training on AI-polished data, and human oversight, and provides open access to the dataset to advance fairer, more robust AI-detection methods.
Abstract
The growing use of large language models (LLMs) for text generation has led to widespread concerns about AI-generated content detection. However, an overlooked challenge is AI-polished text, where human-written content undergoes subtle refinements using AI tools. This raises a critical question: should minimally polished text be classified as AI-generated? Such classification can lead to false plagiarism accusations and misleading claims about AI prevalence in online content. In this study, we systematically evaluate twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation (APT-Eval) dataset, which contains 14.7K samples refined at varying AI-involvement levels. Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models. These limitations highlight the urgent need for more nuanced detection methodologies.
