Detecting Machine-Generated Long-Form Content with Latent-Space Variables
Yufei Tian, Zeyu Pan, Nanyun Peng
TL;DR
This work tackles the problem of reliably detecting machine-generated long-form content in the presence of domain shifts and targeted attacks. It argues that token-space zero-shot detectors falter under varied decoding and prompting, and introduces a latent-variable framework that leverages discourse structures, especially event triggers and transitions, to distinguish machine-generated from human-written text. A latent-space language model trained on human latent sequences, combined with a dual-criterion detector that fuses latent and token-space signals, yields robust performance across movie, news, and scientific writing, achieving a notable AUROC improvement over strong baselines like DetectGPT. The findings reveal that modern LLMs (e.g., GPT-4) exhibit distinct patterns in event selection and transitions compared to humans, supporting the proposed approach while highlighting limitations in explicit planning-based mimicry and the need for improved latent-structure extraction in specialized domains.
Abstract
The increasing capability of large language models (LLMs) to generate fluent long-form texts is presenting new challenges in distinguishing machine-generated outputs from human-written ones, which is crucial for ensuring authenticity and trustworthiness of expressions. Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts, including different prompting and decoding strategies, and adversarial attacks. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts by training a latent-space model on sequences of events or topics derived from human-written texts. In three different domains, machine-generated texts, which are originally inseparable from human texts on the token level, can be better distinguished with our latent-space model, leading to a 31% improvement over strong baselines such as DetectGPT. Our analysis further reveals that, unlike humans, modern LLMs like GPT-4 generate event triggers and their transitions differently, an inherent disparity that helps our method to robustly detect machine-generated texts.
