Table of Contents
Fetching ...

Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection

Jiaqi Chen, Xiaoye Zhu, Tianyang Liu, Ying Chen, Xinhui Chen, Yiwen Yuan, Chak Tou Leong, Zuchao Li, Tang Long, Lei Zhang, Chenyu Yan, Guanghao Mei, Jie Zhang, Lefei Zhang

TL;DR

Detecting machine-revised text—where human content is blended with machine edits—presents a harder challenge than pure machine-generated text. ImBD tackles this by first imitating machine-revision style through Style Preference Optimization, aligning a scoring model to favor machine-like phrasing, then measuring style-conditioned probability curvature (Style-CPC) to distinguish revised content. The approach, validated across six target LLMs, four domains, and three revision types, yields substantial AUROC gains over state-of-the-art detectors and requires only modest training data and time. This style-centric detection has practical impact for academic integrity, misinformation control, and content verification, offering a lightweight, efficient tool for robustly identifying machine-revised texts. The combination of SPO and Style-CPC advances our ability to separate nuanced human-machine collaborations in text generation settings.

Abstract

Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the "Imitate Before Detect" (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce style preference optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just $1,000$ samples and five minutes of SPO, demonstrating its efficiency and effectiveness.

Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection

TL;DR

Detecting machine-revised text—where human content is blended with machine edits—presents a harder challenge than pure machine-generated text. ImBD tackles this by first imitating machine-revision style through Style Preference Optimization, aligning a scoring model to favor machine-like phrasing, then measuring style-conditioned probability curvature (Style-CPC) to distinguish revised content. The approach, validated across six target LLMs, four domains, and three revision types, yields substantial AUROC gains over state-of-the-art detectors and requires only modest training data and time. This style-centric detection has practical impact for academic integrity, misinformation control, and content verification, offering a lightweight, efficient tool for robustly identifying machine-revised texts. The combination of SPO and Style-CPC advances our ability to separate nuanced human-machine collaborations in text generation settings.

Abstract

Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the "Imitate Before Detect" (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce style preference optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just samples and five minutes of SPO, demonstrating its efficiency and effectiveness.

Paper Structure

This paper contains 45 sections, 10 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: (a-c) Comparative examples of human-written, machine-generated, and machine-revised text. (d) Fast-DetectGPT shows a significant drop in detection accuracy when identifying machine-revised text compared to machine-generated text. (e) Our method brings a noticeable improvement in detecting machine-revised text compared to Fast-DetectGPT. "Fast-Det." denotes "Fast-DetectGPT".
  • Figure 2: Impact of Style-conditional probability curvatures (Style-CPC). (Left) Conditional probability curvatures (CPC) from Fast-DetectGPT (denoted as "Fast-Det.") applied to purely machine-generated text; (Middle) Conditional probability curvatures applied to purely machine-revised text; (Right) Style-conditional probability curvatures from ours applied to machine-revised text. The greater the separation between human-written texts (red) and machine-revised texts (blue), the more effective the detection.
  • Figure 3: Imitating the stylistic preferences of LLMs. (a) Token distribution before and after machine-style imitation, demonstrating a deliberate fine-tuning of the scoring model to bias its token distribution towards a machine writing style (e.g., shifting preferences from common words like "explore" to machine-favored tokens such as "delve"). (b) The pipeline of Style Preference Optimization is applied to align the base scoring model with the style of machine-revised content using paired human-machine texts. This results in a machine-style scoring model, which generates token distributions $p(x_n|x_{0:n-1})$ for each position $n$, subsequently used for style-conditional probability curvature calculations.
  • Figure 4: Evaluations of detection accuracy for XSum polished texts trimmed to the specified word count.
  • Figure 5: ROC curves in log scale evaluated on polish task of XSum dataset, where the dash lines denote the random classifier. "Fast-Det." denotes "Fast-DetectGPT".1
  • ...and 1 more figures