Table of Contents
Fetching ...

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

TL;DR

This work introduces Binoculars, a zero-shot, model-agnostic detector for machine-generated text that leverages a cross-perplexity ratio between two closely related LLMs. By normalizing perplexity with cross-perplexity, Binoculars mitigates prompt-induced distortions and yields strong detection across diverse domains without training data. Comprehensive evaluations demonstrate superior out-of-domain performance compared to baselines and robustness to prompting variations, while highlighting limitations in low-resource languages and memorization scenarios. The study emphasizes practical deployment considerations, including false-positive control, reliability in the wild, and ethical implications for moderation and content authenticity. Overall, Binoculars advances zero-shot LLM detection with a simple, transferable scoring mechanism that generalizes across models and tasks.

Abstract

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

TL;DR

This work introduces Binoculars, a zero-shot, model-agnostic detector for machine-generated text that leverages a cross-perplexity ratio between two closely related LLMs. By normalizing perplexity with cross-perplexity, Binoculars mitigates prompt-induced distortions and yields strong detection across diverse domains without training data. Comprehensive evaluations demonstrate superior out-of-domain performance compared to baselines and robustness to prompting variations, while highlighting limitations in low-resource languages and memorization scenarios. The study emphasizes practical deployment considerations, including false-positive control, reliability in the wild, and ethical implications for moderation and content authenticity. Overall, Binoculars advances zero-shot LLM detection with a simple, transferable scoring mechanism that generalizes across models and tasks.

Abstract

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.
Paper Structure (32 sections, 4 equations, 15 figures, 8 tables)

This paper contains 32 sections, 4 equations, 15 figures, 8 tables.

Figures (15)

  • Figure 1: Detection of Machine-Generated Text from ChatGPT. Our detection approach using Binoculars is highly accurate at separating machine-generated and human-written samples from News, Creative Writing and Student Essay datasets with a false positive rate of $0.01\%$. Binoculars, based on open-source Falcon models with no finetuning, outperforms commercial detection systems, such as GPTZero, as well as open-source detectors -- even though both of these baselines are specifically tuned to detect ChatGPT verma_ghostbuster_2023tian_gptzero_2023. Our approach operates entirely in a zero-shot setting and has not been tuned on ChatGPT specifically.
  • Figure 2: Impact of Document Size on Detection Performance. The plot displays the TPR at 0.01% FPR across varying document sizes by prefixing sample documents. The x-axis represents the number of tokens of the observed document, while the y-axis indicates the corresponding detection performance, highlighting the Binoculars ability to detect with a low number of tokens.
  • Figure 3: Detecting LLaMA-2-13B generations.Binoculars achieves higher TPRs for low FPRs (on log scale) than other methods.
  • Figure 4: Detection of ChatGPT-generated text in various domains from M4 Dataset. Binoculars is more precise over 4 domains using the OOD threshold for detection. We use the mean of out-of-domain performance metrics reported by wang_m4_2023
  • Figure 5: Performance of Binoculars on samples from various generative models.
  • ...and 10 more figures