Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text
Sara Abdali, Richard Anarfi, CJ Barberan, Jia He
TL;DR
This work surveys the landscape of AI-generated text risks and detection, framing detection as a core mitigation for responsible AI governance. It categorizes detection techniques into supervised, zero-shot, retrieval-based, watermarking, and discriminating features, analyzing their strengths and key vulnerabilities, including susceptibility to paraphrasing, spoofing, and adversarial prompting. Theoretical analyses reveal fundamental limits on detectability via AUROC upper-bounds tied to distributional distance, while demonstrating that large sample regimes can improve detection, and that robust watermarking faces intrinsic barriers under realistic assumptions. The paper then outlines concrete future directions—diverse datasets, interpretable features, advanced learning methods, multi-aspect evaluation, and hybrid strategies—to push toward reliable detection in the face of evolving LLM capabilities. Overall, it argues for a principled combination of empirical techniques and theoretical grounding to advance practical and robust AI-generated text detection for safer deployment.
Abstract
Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text. However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices. In this study, we delve into these challenges, explore existing strategies for mitigating them, with a particular emphasis on identifying AI-generated text as the ultimate solution. Additionally, we assess the feasibility of detection from a theoretical perspective and propose novel research directions to address the current limitations in this domain.
