Table of Contents
Fetching ...

Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness

Shixuan Ma, Quan Wang

TL;DR

TOCSIN is devised, a generic dual-channel detection paradigm that uses token cohesiveness as a plug-and-play module to improve existing zero-shot detectors and demonstrates that LLM-generated text tends to exhibit higher token cohesiveness than human-written text.

Abstract

The increasing capability and widespread usage of large language models (LLMs) highlight the desirability of automatic detection of LLM-generated text. Zero-shot detectors, due to their training-free nature, have received considerable attention and notable success. In this paper, we identify a new feature, token cohesiveness, that is useful for zero-shot detection, and we demonstrate that LLM-generated text tends to exhibit higher token cohesiveness than human-written text. Based on this observation, we devise TOCSIN, a generic dual-channel detection paradigm that uses token cohesiveness as a plug-and-play module to improve existing zero-shot detectors. To calculate token cohesiveness, TOCSIN only requires a few rounds of random token deletion and semantic difference measurement, making it particularly suitable for a practical black-box setting where the source model used for generation is not accessible. Extensive experiments with four state-of-the-art base detectors on various datasets, source models, and evaluation settings demonstrate the effectiveness and generality of the proposed approach. Code available at: \url{https://github.com/Shixuan-Ma/TOCSIN}.

Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness

TL;DR

TOCSIN is devised, a generic dual-channel detection paradigm that uses token cohesiveness as a plug-and-play module to improve existing zero-shot detectors and demonstrates that LLM-generated text tends to exhibit higher token cohesiveness than human-written text.

Abstract

The increasing capability and widespread usage of large language models (LLMs) highlight the desirability of automatic detection of LLM-generated text. Zero-shot detectors, due to their training-free nature, have received considerable attention and notable success. In this paper, we identify a new feature, token cohesiveness, that is useful for zero-shot detection, and we demonstrate that LLM-generated text tends to exhibit higher token cohesiveness than human-written text. Based on this observation, we devise TOCSIN, a generic dual-channel detection paradigm that uses token cohesiveness as a plug-and-play module to improve existing zero-shot detectors. To calculate token cohesiveness, TOCSIN only requires a few rounds of random token deletion and semantic difference measurement, making it particularly suitable for a practical black-box setting where the source model used for generation is not accessible. Extensive experiments with four state-of-the-art base detectors on various datasets, source models, and evaluation settings demonstrate the effectiveness and generality of the proposed approach. Code available at: \url{https://github.com/Shixuan-Ma/TOCSIN}.
Paper Structure (36 sections, 10 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 36 sections, 10 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: Histograms of token cohesiveness distributions for 500 human-written and 500 LLM-generated articles. Human-written articles are sampled from XSum narayan-etal-2018-dont, and LLM-generated articles are produced by prompting four source models with the first 30 tokens of each human-written article. The calculation of token cohesiveness will be detailed in Section \ref{['subsec:assertion']}.
  • Figure 2: Overview of TOCSIN. The input text $x$ is fed into the upper channel to calculate token cohesiveness $u(x)$, and the lower channel to produce raw prediction $v(x)$. The two scores are then combined into $w(x)$, and if the combination exceeds a predefined threshold $\epsilon$, the text $x$ is categorized as LLM-generated.
  • Figure 3: Heatmaps of Pearson Correlation Coefficient between scores from different detectors, averaged across XSum, SQuAD, WritingPrompts and five open-source models. Lighter colors indicate lower correlation, while darker colors indicate stronger correlation.
  • Figure 4: Distribution of token cohesiveness between 150 human-written and 150 ChatGPT-generated passages from WritingPrompts truncated to target length.
  • Figure 5: AUROC for detecting ChatGPT and GPT-4 passages on WritingPrompts truncated to target length.
  • ...and 2 more figures