Table of Contents
Fetching ...

GPTZero: Robust Detection of LLM-Generated Texts

George Alexandru Adam, Alexander Cui, Edwin Thomas, Emily Napier, Nazar Shmatko, Jacob Schnell, Jacob Junqi Tian, Alekhya Dronavalli, Edward Tian, Dongwon Lee

TL;DR

GPTZero addresses the problem of authenticating text authorship in the era of LLMs by introducing a hierarchical, multi-task detector that yields document and sentence level predictions. The detector leverages a ternary top-level taxonomy (Human, AI, Mixed) with a fine-grained AI substructure, trained via a multi-task loss and reinforced through multi-tiered red teaming against paraphrasing and adversarial edits. It offers Deep Scan attributions for interpretability, robust multilingual and domain-general performance, and an explicit polymorphic handling of polished content. Empirical results show state-of-the-art accuracy with low false positives across diverse domains and languages, and strong resilience to sophisticated bypass attempts, making it practically impactful for exams, publishing, and content platforms. Limitations include evaluation standardization, data engineering demands, and ongoing need for public benchmarks to monitor generalization to new LLMs.

Abstract

While historical considerations surrounding text authenticity revolved primarily around plagiarism, the advent of large language models (LLMs) has introduced a new challenge: distinguishing human-authored from AI-generated text. This shift raises significant concerns, including the undermining of skill evaluations, the mass-production of low-quality content, and the proliferation of misinformation. Addressing these issues, we introduce GPTZero a state-of-the-art industrial AI detection solution, offering reliable discernment between human and LLM-generated text. Our key contributions include: introducing a hierarchical, multi-task architecture enabling a flexible taxonomy of human and AI texts, demonstrating state-of-the-art accuracy on a variety of domains with granular predictions, and achieving superior robustness to adversarial attacks and paraphrasing via multi-tiered automated red teaming. GPTZero offers accurate and explainable detection, and educates users on its responsible use, ensuring fair and transparent assessment of text.

GPTZero: Robust Detection of LLM-Generated Texts

TL;DR

GPTZero addresses the problem of authenticating text authorship in the era of LLMs by introducing a hierarchical, multi-task detector that yields document and sentence level predictions. The detector leverages a ternary top-level taxonomy (Human, AI, Mixed) with a fine-grained AI substructure, trained via a multi-task loss and reinforced through multi-tiered red teaming against paraphrasing and adversarial edits. It offers Deep Scan attributions for interpretability, robust multilingual and domain-general performance, and an explicit polymorphic handling of polished content. Empirical results show state-of-the-art accuracy with low false positives across diverse domains and languages, and strong resilience to sophisticated bypass attempts, making it practically impactful for exams, publishing, and content platforms. Limitations include evaluation standardization, data engineering demands, and ongoing need for public benchmarks to monitor generalization to new LLMs.

Abstract

While historical considerations surrounding text authenticity revolved primarily around plagiarism, the advent of large language models (LLMs) has introduced a new challenge: distinguishing human-authored from AI-generated text. This shift raises significant concerns, including the undermining of skill evaluations, the mass-production of low-quality content, and the proliferation of misinformation. Addressing these issues, we introduce GPTZero a state-of-the-art industrial AI detection solution, offering reliable discernment between human and LLM-generated text. Our key contributions include: introducing a hierarchical, multi-task architecture enabling a flexible taxonomy of human and AI texts, demonstrating state-of-the-art accuracy on a variety of domains with granular predictions, and achieving superior robustness to adversarial attacks and paraphrasing via multi-tiered automated red teaming. GPTZero offers accurate and explainable detection, and educates users on its responsible use, ensuring fair and transparent assessment of text.
Paper Structure (39 sections, 3 equations, 11 figures, 8 tables)

This paper contains 39 sections, 3 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: The process of developing and improving our deep learning model for detecting LLM-generated texts.
  • Figure 2: Hierarchical Classifier Heads
  • Figure 3: GPTZero's multi-tiered red teaming approach covers a variety of adversarial threats, providing unprecedented robustness.
  • Figure 4: Drops in AI probability after removing top-k% of high AI impact sentences predicted by our DeepScan Feature (Blue) and Detector Sentence Head (Orange)
  • Figure 5: Levensthein ratio for polished sample misclassifications.
  • ...and 6 more figures