Table of Contents
Fetching ...

Black-box Detection of LLM-generated Text Using Generalized Jensen-Shannon Divergence

Shuangyi Chen, Ashish Khisti

TL;DR

This work addresses black-box detection of LLM-generated text by leveraging token surprisal dynamics without per-instance regeneration. SurpMark discretizes surprisal into a finite state space and models state transitions as a Markov chain, scoring test texts against fixed human and machine references via a generalized Jensen-Shannon divergence. The authors provide a principled discretization criterion, prove the decision statistic is a normalized log-likelihood ratio with asymptotic normality, and demonstrate strong empirical performance across multiple datasets, models, and languages. The approach offers scalable, low-latency detection suitable for real-world deployment, with robust performance under domain and proxy-model shifts. The framework also highlights a favorable trade-off between reference cost and detection accuracy, achieving competitive results while reducing per-input computational burden.

Abstract

We study black-box detection of machine-generated text under practical constraints: the scoring model (proxy LM) may mismatch the unknown source model, and per-input contrastive generation is costly. We propose SurpMark, a reference-based detector that summarizes a passage by the dynamics of its token surprisals. SurpMark quantizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores it via a generalized Jensen-Shannon (GJS) gap between the test transitions and two fixed references (human vs. machine) built once from historical corpora. We prove a principled discretization criterion and establish the asymptotic normality of the decision statistic. Empirically, across multiple datasets, source models, and scenarios, SurpMark consistently matches or surpasses baselines; our experiments corroborate the statistic's asymptotic normality, and ablations validate the effectiveness of the proposed discretization.

Black-box Detection of LLM-generated Text Using Generalized Jensen-Shannon Divergence

TL;DR

This work addresses black-box detection of LLM-generated text by leveraging token surprisal dynamics without per-instance regeneration. SurpMark discretizes surprisal into a finite state space and models state transitions as a Markov chain, scoring test texts against fixed human and machine references via a generalized Jensen-Shannon divergence. The authors provide a principled discretization criterion, prove the decision statistic is a normalized log-likelihood ratio with asymptotic normality, and demonstrate strong empirical performance across multiple datasets, models, and languages. The approach offers scalable, low-latency detection suitable for real-world deployment, with robust performance under domain and proxy-model shifts. The framework also highlights a favorable trade-off between reference cost and detection accuracy, achieving competitive results while reducing per-input computational burden.

Abstract

We study black-box detection of machine-generated text under practical constraints: the scoring model (proxy LM) may mismatch the unknown source model, and per-input contrastive generation is costly. We propose SurpMark, a reference-based detector that summarizes a passage by the dynamics of its token surprisals. SurpMark quantizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores it via a generalized Jensen-Shannon (GJS) gap between the test transitions and two fixed references (human vs. machine) built once from historical corpora. We prove a principled discretization criterion and establish the asymptotic normality of the decision statistic. Empirically, across multiple datasets, source models, and scenarios, SurpMark consistently matches or surpasses baselines; our experiments corroborate the statistic's asymptotic normality, and ablations validate the effectiveness of the proposed discretization.

Paper Structure

This paper contains 55 sections, 14 theorems, 100 equations, 8 figures, 10 tables, 2 algorithms.

Key Result

Proposition 4.1

Let $\mathcal{S}_P,\mathcal{S}_Q$ be the population first-order Markov transition kernels on the continuous surprisal space $\mathbb{R}$. Consider a shared $k$-bin quantizer $q_k:\mathbb{R} \rightarrow \mathcal{A}$ and, from it, form the discretized $k$-state Markov chains $M_P, M_Q$. For any row-ag where $C$ depends on $(P,Q,f)$ but not on the reference length $N$.

Figures (8)

  • Figure 1: SurpMark framework. Offline, we build human/machine reference transition matrices by scoring corpora with a proxy LM, discretizing surprisal via a shared $q_k$, and counting state transitions. Online, a test passage is summarized the same way and assigned a GJS score to measure proximity to human vs. machine references. Details are in Algorithm \ref{['alg:surpmark-offline']} and \ref{['alg:surpmark-online']} in Appendix \ref{['sec:alg']}.
  • Figure 2: (a) Visualizes the key feature driving our detector by comparing the conditional probabilities of transitioning into and out of the "Highly Surprising" state under a 4-bin discretization. This reveals distinct dynamic patterns, including a stronger recovery tendency and a more pronounced spiking tendency from low-surprisal contexts in LLM-generated text. (b) A heatmap illustrating the detector's performance (AUROC) on SQuAD across different hyperparameter settings, justifying our choice of model order. (c) The final score distributions of our detector.
  • Figure 3: Effect of the number of bins $k$ on detection performance for source models including GPT-J-6B (left) and Llama-3.2-3B (right).
  • Figure 4: (a) AUROC vs. number of reference samples. The blue curve (“$k$-optimized”) picks the best $k$ at each number of reference. orange/green curves fix $k \in \{7,8\}$. (b) AUROC vs. test length $n$ under different reference lengths. Solid lines are k-optimized for each reference sample truncated to 50/100/200 tokens; shaded bands show the attainable range across $k$ at each $n$. (c) Detection results of 7 detection methods on 6 test lengths.
  • Figure 5: (a-b) AUROC contour maps (WritingPrompts/Gemma-7B). Left: $k=7$; right: $k=8$. The x-axis is reference length (tokens) and the y-axis is test length (tokens). Colors encode AUROC. In both panels, contours tilt up-right, indicating a trade-off: larger reference length allows smaller test length at similar performance. (c) AUROC vs. proxy model.
  • ...and 3 more figures

Theorems & Definitions (20)

  • Proposition 4.1
  • Theorem 4.2
  • Proposition 4.3
  • Theorem 4.4: Asymptotic normality of $\Delta\mathrm{GJS}_n$ (informal)
  • Lemma A2.3: Approximate Lipschitz Property of the $f$-divergence, Lemma 20 in pillutla2023mauvescoresgenerativemodels
  • Proposition A2.5: Quantization Error of f-Divergence, Proposition 13 in pillutla2023mauvescoresgenerativemodels
  • Lemma A2.6: Row-wise TV bound, wolfer2023empiricalinstancedependentestimationmarkov
  • Lemma A2.7: Missing Mass Bound, Theorem 1 in skorski2020missingmassconcentrationmarkov
  • Lemma A2.8: Theorem 3.1 of chung2012chernoffhoeffdingboundsmarkovchains
  • Lemma A2.9
  • ...and 10 more