Smaller Language Models are Better Black-box Machine-Generated Text Detectors

Niloofar Mireshghallah; Justus Mattern; Sicun Gao; Reza Shokri; Taylor Berg-Kirkpatrick

Smaller Language Models are Better Black-box Machine-Generated Text Detectors

Niloofar Mireshghallah, Justus Mattern, Sicun Gao, Reza Shokri, Taylor Berg-Kirkpatrick

TL;DR

The paper tackles the problem of detecting machine-generated text without knowing the generator by applying a curvature-based local-optimality test on a surrogate detector's likelihood. It demonstrates that smaller, partially trained detectors serve as universal cross-detectors, often approaching the performance of self-detection and sometimes outperforming larger, fully trained detectors. Key findings show that perturbation quality, masking strategies, and sequence length significantly influence detection power, with smaller models better at distinguishing generated text across a range of generators. The work suggests practical implications for black-box detection of LLM outputs and highlights the need for further validation as models evolve.

Abstract

With the advent of fluent generative language models that can produce convincing utterances very similar to those written by humans, distinguishing whether a piece of text is machine-generated or human-written becomes more challenging and more important, as such models could be used to spread misinformation, fake news, fake reviews and to mimic certain authors and figures. To this end, there have been a slew of methods proposed to detect machine-generated text. Most of these methods need access to the logits of the target model or need the ability to sample from the target. One such black-box detection method relies on the observation that generated text is locally optimal under the likelihood function of the generator, while human-written text is not. We find that overall, smaller and partially-trained models are better universal text detectors: they can more precisely detect text generated from both small and larger models. Interestingly, we find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.

Smaller Language Models are Better Black-box Machine-Generated Text Detectors

TL;DR

Abstract

Paper Structure (22 sections, 1 equation, 12 figures)

This paper contains 22 sections, 1 equation, 12 figures.

Introduction
Methodology
Experimental Setup
Does cross-detection work?
Smaller Models Are Better Detectors
Partially Trained Models are Better Detectors
How are smaller models better detectors?
Does neighborhood choice matter?
Masking Percentage
How many tokens do we need for detection?
Related Work
Relationship to Membership Inference Attacks (MIA)
Conclusion
Ablating Mask Filling Models
Experimental Setup
...and 7 more sections

Figures (12)

Figure 1: We want to study how models can cross-detect, i.e. distinguish between human-written text and machine-generated text generated by another model. To this end, we create a target pool consisting of both human-written and machine-generated text. We then generate perturbations of each target sequence using a perturbation model. We find the likelihood of the target pool and perturbations under a detector model in order to estimate the local optimality under the detector model's likelihood. We use the estimate of local optimality to determine if a sequence is machine generated or not.
Figure 2: AUC heatmap for cross-detection, where the rows are generator models and columns are the surrogate detector models, both sorted by model size. We can see that smaller models are better detectors and larger models are the worst models in terms of detection power.
Figure 3: Summary of the results for cross-detection power of different detector models trained for different number of steps. Each subfigure shows a different detector model, and the x-axis shows the training step for the checkpoint used as a detector. The results for all $15$ generator models are shown in Figure \ref{['fig:main_heatmap_checkpoint']}.
Figure 4: Comparison of curvature and log likelihood values (mean and standard deviation) for the best universal detector (OPT-125M), a medium sized detector (OPT-350M), and a larger detector from the same family (OPT-6.7B) on generations from models of various sizes (x-axis). The 'Detector Model' line shows values for when the generator and detector are the same model. Detectors tend to show higher curvature on generations than human-written text only for generations from models of the same size or larger.
Figure 5: AUC of the three cross-detectors from Figure \ref{['fig:curv_auc_bad_good']}
...and 7 more figures

Smaller Language Models are Better Black-box Machine-Generated Text Detectors

TL;DR

Abstract

Smaller Language Models are Better Black-box Machine-Generated Text Detectors

Authors

TL;DR

Abstract

Table of Contents

Figures (12)