Table of Contents
Fetching ...

Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection

Guangsheng Bao, Yanbin Zhao, Juncai He, Yue Zhang

TL;DR

Glimpse, a probability distribution estimation approach, predicts the full distributions from partial observations, successfully extends white-box methods like Entropy, Rank, Log-Rank, and Fast-DetectGPT to latest proprietary models and demonstrates that advanced LLMs may be the best shield against themselves.

Abstract

Advanced large language models (LLMs) can generate text almost indistinguishable from human-written text, highlighting the importance of LLM-generated text detection. However, current zero-shot techniques face challenges as white-box methods are restricted to use weaker open-source LLMs, and black-box methods are limited by partial observation from stronger proprietary LLMs. It seems impossible to enable white-box methods to use proprietary models because API-level access to the models neither provides full predictive distributions nor inner embeddings. To traverse the divide, we propose **Glimpse**, a probability distribution estimation approach, predicting the full distributions from partial observations. Despite the simplicity of Glimpse, we successfully extend white-box methods like Entropy, Rank, Log-Rank, and Fast-DetectGPT to latest proprietary models. Experiments show that Glimpse with Fast-DetectGPT and GPT-3.5 achieves an average AUROC of about 0.95 in five latest source models, improving the score by 51% relative to the remaining space of the open source baseline. It demonstrates that the latest LLMs can effectively detect their own outputs, suggesting that advanced LLMs may be the best shield against themselves. We release our code and data at https://github.com/baoguangsheng/glimpse.

Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection

TL;DR

Glimpse, a probability distribution estimation approach, predicts the full distributions from partial observations, successfully extends white-box methods like Entropy, Rank, Log-Rank, and Fast-DetectGPT to latest proprietary models and demonstrates that advanced LLMs may be the best shield against themselves.

Abstract

Advanced large language models (LLMs) can generate text almost indistinguishable from human-written text, highlighting the importance of LLM-generated text detection. However, current zero-shot techniques face challenges as white-box methods are restricted to use weaker open-source LLMs, and black-box methods are limited by partial observation from stronger proprietary LLMs. It seems impossible to enable white-box methods to use proprietary models because API-level access to the models neither provides full predictive distributions nor inner embeddings. To traverse the divide, we propose **Glimpse**, a probability distribution estimation approach, predicting the full distributions from partial observations. Despite the simplicity of Glimpse, we successfully extend white-box methods like Entropy, Rank, Log-Rank, and Fast-DetectGPT to latest proprietary models. Experiments show that Glimpse with Fast-DetectGPT and GPT-3.5 achieves an average AUROC of about 0.95 in five latest source models, improving the score by 51% relative to the remaining space of the open source baseline. It demonstrates that the latest LLMs can effectively detect their own outputs, suggesting that advanced LLMs may be the best shield against themselves. We release our code and data at https://github.com/baoguangsheng/glimpse.

Paper Structure

This paper contains 28 sections, 27 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Take Fast-DetectGPT as an example to apply Glimpse. The notion $\widetilde{\text{GPT}}$ refers to the model with estimated distribution, where the partial observation (top-$K$ probabilities) returned by the model API is completed into a full distribution. The ' token' column is just for reference, which is not necessary for calculating the metric (conditional probability curvature).
  • Figure 2: KL divergence against real distributions from Neo-2.7B.
  • Figure 3: Correlation between AUROC and KL divergence, evaluated on XSum produced by GPT-4. We use the open-source model Neo-2.7B as the scoring model for Glimpse algorithms.
  • Figure 4: Ablation on estimation algorithm, where the AUROC is averaged across the five source models. Each dataset has its own preferred algorithm.
  • Figure 5: Ablation on top-$K$, where the AUROC is averaged across the datasets produced by GPT-4. Each line represents a combination of methods and scoring models.
  • ...and 4 more figures