Technical Report on the Pangram AI-Generated Text Classifier

Bradley Emi; Max Spero

Technical Report on the Pangram AI-Generated Text Classifier

Bradley Emi, Max Spero

TL;DR

Pangram Text introduces a transformer-based AI-text detector engineered to distinguish machine-generated text from human writing with exceptionally low false positives across diverse domains and unseen models. The core innovations are mirror prompting and hard negative mining, implemented within a curriculum-inspired training loop that scales to web-size data while maintaining robust optimization. Key contributions include a detailed scaling-law analysis, the synthetic-mirror data generation pipeline, and strong cross-domain and multilingual generalization, achieving near-perfect production-level accuracy and resilience to new LLMs such as GPT-4. The work provides a practical, publicly available detector with rigorous evaluation, while acknowledging ethical use and the importance of corroborating detection with additional evidence.

Abstract

We present Pangram Text, a transformer-based neural network trained to distinguish text written by large language models from text written by humans. Pangram Text outperforms zero-shot methods such as DetectGPT as well as leading commercial AI detection tools with over 38 times lower error rates on a comprehensive benchmark comprised of 10 text domains (student writing, creative writing, scientific writing, books, encyclopedias, news, email, scientific papers, short-form Q&A) and 8 open- and closed-source large language models. We propose a training algorithm, hard negative mining with synthetic mirrors, that enables our classifier to achieve orders of magnitude lower false positive rates on high-data domains such as reviews. Finally, we show that Pangram Text is not biased against nonnative English speakers and generalizes to domains and models unseen during training.

Technical Report on the Pangram AI-Generated Text Classifier

TL;DR

Abstract

Paper Structure (29 sections, 7 figures, 6 tables, 1 algorithm)

This paper contains 29 sections, 7 figures, 6 tables, 1 algorithm.

Introduction
Algorithm
Training Algorithm
Results
Overview
Overall Performance
Performance by Domain
Performance by LLM
Performance on Nonnative English (ESL)
Performance on Out-of-Distribution Examples
Performance on LLMs unseen during training
July 2024 Update: Performance on Recently Released LLMs
July 2024 Update: Performance on non-English Languages
Additional Benchmark Information
Dataset Sources
...and 14 more sections

Figures (7)

Figure 1: Training process for the Pangram AI-generated text classifier. An initial classifier predicts on a large training pool of human examples, identifying false positives which are then added to the training set and mirrored by LLMs.
Figure 2: Overall results. (Left): Accuracy by detection method. (Right): False positive and false negative rates by detection method. Pangram has significantly higher accuracy than the next best methods, demonstrating state-of-the-art performance.
Figure 3: Accuracy by Domain. Pangram outperforms GPTZero and Originality on all 10 domains tested, demonstrating robustness to a wide variety of writing styles and formats.
Figure 4: False Positive and False Negative Rates by Domain. Other models show bias towards over- or under-predicting AI labels. Pangram is the only model that achieves both low FPR and FNR.
Figure 5: Recall at 1% FPR by LLM that generated the AI text. Pangram's performance remains strong on the most capable model, GPT-4, while the other models experience a severe degradation in performance.
...and 2 more figures

Technical Report on the Pangram AI-Generated Text Classifier

TL;DR

Abstract

Technical Report on the Pangram AI-Generated Text Classifier

Authors

TL;DR

Abstract

Table of Contents

Figures (7)