Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

Sara Abdali; Richard Anarfi; CJ Barberan; Jia He

Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

Sara Abdali, Richard Anarfi, CJ Barberan, Jia He

TL;DR

This work surveys the landscape of AI-generated text risks and detection, framing detection as a core mitigation for responsible AI governance. It categorizes detection techniques into supervised, zero-shot, retrieval-based, watermarking, and discriminating features, analyzing their strengths and key vulnerabilities, including susceptibility to paraphrasing, spoofing, and adversarial prompting. Theoretical analyses reveal fundamental limits on detectability via AUROC upper-bounds tied to distributional distance, while demonstrating that large sample regimes can improve detection, and that robust watermarking faces intrinsic barriers under realistic assumptions. The paper then outlines concrete future directions—diverse datasets, interpretable features, advanced learning methods, multi-aspect evaluation, and hybrid strategies—to push toward reliable detection in the face of evolving LLM capabilities. Overall, it argues for a principled combination of empirical techniques and theoretical grounding to advance practical and robust AI-generated text detection for safer deployment.

Abstract

Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text. However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices. In this study, we delve into these challenges, explore existing strategies for mitigating them, with a particular emphasis on identifying AI-generated text as the ultimate solution. Additionally, we assess the feasibility of detection from a theoretical perspective and propose novel research directions to address the current limitations in this domain.

Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

TL;DR

Abstract

Paper Structure (28 sections, 8 equations, 3 figures, 1 table)

This paper contains 28 sections, 8 equations, 3 figures, 1 table.

Introduction
Risks and Misuse of AI-Generated Text
Discrimination, Toxicity, and Harms
Factual Inconsistency and Unreliability of AI Responses
Copyright Infringement and Plagiarism
Misinformation Dissemination
AI-generated Text Detection Techniques
Supervised Detection
Zero-shot Detection
Retrieval-based Detection
Watermarking-based Detection
Detection via Discriminating Features
Vulnerabilities of Detection Techniques
Detection Possibility Through a Theoretical Lens
Impossibility via AUROC Upper-bound
...and 13 more sections

Figures (3)

Figure 1: An overview of responsible AI-generated text study, with an emphasize on detection approaches and their challenges.
Figure 2: A summary of detection vulnerabilities.
Figure 3: Comparing AUROC of the optimal detector to a random classifier demonstrates that as the TV distance between AI and human text distributions reduces, the AUROC of the optimal detector also decreases accordingly.

Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

TL;DR

Abstract

Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text

Authors

TL;DR

Abstract

Table of Contents

Figures (3)