Table of Contents
Fetching ...

Detecting Machine-Generated Long-Form Content with Latent-Space Variables

Yufei Tian, Zeyu Pan, Nanyun Peng

TL;DR

This work tackles the problem of reliably detecting machine-generated long-form content in the presence of domain shifts and targeted attacks. It argues that token-space zero-shot detectors falter under varied decoding and prompting, and introduces a latent-variable framework that leverages discourse structures, especially event triggers and transitions, to distinguish machine-generated from human-written text. A latent-space language model trained on human latent sequences, combined with a dual-criterion detector that fuses latent and token-space signals, yields robust performance across movie, news, and scientific writing, achieving a notable AUROC improvement over strong baselines like DetectGPT. The findings reveal that modern LLMs (e.g., GPT-4) exhibit distinct patterns in event selection and transitions compared to humans, supporting the proposed approach while highlighting limitations in explicit planning-based mimicry and the need for improved latent-structure extraction in specialized domains.

Abstract

The increasing capability of large language models (LLMs) to generate fluent long-form texts is presenting new challenges in distinguishing machine-generated outputs from human-written ones, which is crucial for ensuring authenticity and trustworthiness of expressions. Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts, including different prompting and decoding strategies, and adversarial attacks. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts by training a latent-space model on sequences of events or topics derived from human-written texts. In three different domains, machine-generated texts, which are originally inseparable from human texts on the token level, can be better distinguished with our latent-space model, leading to a 31% improvement over strong baselines such as DetectGPT. Our analysis further reveals that, unlike humans, modern LLMs like GPT-4 generate event triggers and their transitions differently, an inherent disparity that helps our method to robustly detect machine-generated texts.

Detecting Machine-Generated Long-Form Content with Latent-Space Variables

TL;DR

This work tackles the problem of reliably detecting machine-generated long-form content in the presence of domain shifts and targeted attacks. It argues that token-space zero-shot detectors falter under varied decoding and prompting, and introduces a latent-variable framework that leverages discourse structures, especially event triggers and transitions, to distinguish machine-generated from human-written text. A latent-space language model trained on human latent sequences, combined with a dual-criterion detector that fuses latent and token-space signals, yields robust performance across movie, news, and scientific writing, achieving a notable AUROC improvement over strong baselines like DetectGPT. The findings reveal that modern LLMs (e.g., GPT-4) exhibit distinct patterns in event selection and transitions compared to humans, supporting the proposed approach while highlighting limitations in explicit planning-based mimicry and the need for improved latent-structure extraction in specialized domains.

Abstract

The increasing capability of large language models (LLMs) to generate fluent long-form texts is presenting new challenges in distinguishing machine-generated outputs from human-written ones, which is crucial for ensuring authenticity and trustworthiness of expressions. Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts, including different prompting and decoding strategies, and adversarial attacks. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts by training a latent-space model on sequences of events or topics derived from human-written texts. In three different domains, machine-generated texts, which are originally inseparable from human texts on the token level, can be better distinguished with our latent-space model, leading to a 31% improvement over strong baselines such as DetectGPT. Our analysis further reveals that, unlike humans, modern LLMs like GPT-4 generate event triggers and their transitions differently, an inherent disparity that helps our method to robustly detect machine-generated texts.
Paper Structure (39 sections, 4 equations, 9 figures, 9 tables, 1 algorithm)

This paper contains 39 sections, 4 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: (a) Existing zero-shot detectors that rely on the token distributions (observation space statistics) are not robust to various real-world scenarios such as high decoding temperature, complex prompts, and adversarial attacks. (b) Our detector with latent features (e.g., discourse tags) are more robust to these changes.
  • Figure 2: Both logit-based and perturbation-based detectors are not robust to changes in decoding, variations of prompting style, and adversarial attacks.
  • Figure 3: Left and middle: kernel density plots of the sample-space curvature and latent space PPL across five test sets in the news domain. These include human-written texts (collected from multiple sources) and machine-generated texts under four different configurations. The plots reveal complementary strengths: 1) the sample-space curvature only effectively distinguishes machine outputs generated from typical settings but fail to identify outputs generated with complex prompts or after paraphrasing/edit attacks; 2) the latent-space PPL excels at distinguishing those non-standard settings. Right: considering both criteria leads to the most robust detection performance.
  • Figure 4: Generative process with latent variables.
  • Figure 5: 2D density clouds. For better readability, we only show four sets of machine generated outputs.
  • ...and 4 more figures