Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein
TL;DR
This work introduces Binoculars, a zero-shot, model-agnostic detector for machine-generated text that leverages a cross-perplexity ratio between two closely related LLMs. By normalizing perplexity with cross-perplexity, Binoculars mitigates prompt-induced distortions and yields strong detection across diverse domains without training data. Comprehensive evaluations demonstrate superior out-of-domain performance compared to baselines and robustness to prompting variations, while highlighting limitations in low-resource languages and memorization scenarios. The study emphasizes practical deployment considerations, including false-positive control, reliability in the wild, and ethical implications for moderation and content authenticity. Overall, Binoculars advances zero-shot LLM detection with a simple, transferable scoring mechanism that generalizes across models and tasks.
Abstract
Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.
