Table of Contents
Fetching ...

Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling

Yuhui Shi, Qiang Sheng, Juan Cao, Hao Mi, Beizhe Hu, Danding Wang

TL;DR

The paper tackles AI-generated text detection in black-box settings by estimating white-box-like word-generation probabilities through proxy-guided, efficient re-sampling. It introduces POGER, which selects a small set of low-probability words via a proxy model, performs targeted re-sampling, and fuses probabilistic signals with context-aware features to classify texts. Across binary, multiclass, and out-of-distribution scenarios on texts from humans and seven LLMs, POGER outperforms baselines while reducing sampling costs. The approach demonstrates strong generalization and robustness, offering a practical barrier against AI-generated misinformation and misuse in real-world applications.

Abstract

With the rapidly increasing application of large language models (LLMs), their abuse has caused many undesirable societal problems such as fake news, academic dishonesty, and information pollution. This makes AI-generated text (AIGT) detection of great importance. Among existing methods, white-box methods are generally superior to black-box methods in terms of performance and generalizability, but they require access to LLMs' internal states and are not applicable to black-box settings. In this paper, we propose to estimate word generation probabilities as pseudo white-box features via multiple re-sampling to help improve AIGT detection under the black-box setting. Specifically, we design POGER, a proxy-guided efficient re-sampling method, which selects a small subset of representative words (e.g., 10 words) for performing multiple re-sampling in black-box AIGT detection. Experiments on datasets containing texts from humans and seven LLMs show that POGER outperforms all baselines in macro F1 under black-box, partial white-box, and out-of-distribution settings and maintains lower re-sampling costs than its existing counterparts.

Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling

TL;DR

The paper tackles AI-generated text detection in black-box settings by estimating white-box-like word-generation probabilities through proxy-guided, efficient re-sampling. It introduces POGER, which selects a small set of low-probability words via a proxy model, performs targeted re-sampling, and fuses probabilistic signals with context-aware features to classify texts. Across binary, multiclass, and out-of-distribution scenarios on texts from humans and seven LLMs, POGER outperforms baselines while reducing sampling costs. The approach demonstrates strong generalization and robustness, offering a practical barrier against AI-generated misinformation and misuse in real-world applications.

Abstract

With the rapidly increasing application of large language models (LLMs), their abuse has caused many undesirable societal problems such as fake news, academic dishonesty, and information pollution. This makes AI-generated text (AIGT) detection of great importance. Among existing methods, white-box methods are generally superior to black-box methods in terms of performance and generalizability, but they require access to LLMs' internal states and are not applicable to black-box settings. In this paper, we propose to estimate word generation probabilities as pseudo white-box features via multiple re-sampling to help improve AIGT detection under the black-box setting. Specifically, we design POGER, a proxy-guided efficient re-sampling method, which selects a small subset of representative words (e.g., 10 words) for performing multiple re-sampling in black-box AIGT detection. Experiments on datasets containing texts from humans and seven LLMs show that POGER outperforms all baselines in macro F1 under black-box, partial white-box, and out-of-distribution settings and maintains lower re-sampling costs than its existing counterparts.
Paper Structure (44 sections, 8 equations, 6 figures, 7 tables)

This paper contains 44 sections, 8 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Paradigm comparison between our proposed POGER and existing white-box/black-box methods. POGER does not require LLMs' internal states like output logits and performs better than the other types of baselines under black-box and out-of-distribution (OOD) settings.
  • Figure 2: Detection performance using estimated probabilities under different (a) sampling times and (b) sampling temperatures.
  • Figure 3: Architecture of POGER. Given a text, POGER operates with three steps: 1) Error-aware word selection, where a white-box LLM as a proxy to nominate candidate low-probability words and the bottom-$k$ word selector preserves the lowest $k$ word the satisfied estimation error bound; 2) Probability estimation, where multiple re-sampling is applied to candidate black-box LLMs for the selected $k$ word and a pseudo probabilistic feature $\mathbf{L}$ consisting of estimated probabilities is computed; 3) Classification, where contextual feature $\mathbf{C}$ is introduced to compensate the context loss in $\mathbf{L}$ to obtain enhanced feature $\mathbf{F}$ for final binary or multiclass AI-generated text detection.
  • Figure 4: Distribution of attention weight for words in different probability ranking intervals.
  • Figure 5: Overlapping Proportion of low-probability words between different LLMs.
  • ...and 1 more figures