Table of Contents
Fetching ...

Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective

Chenwang Wu, Yiu-ming Cheung, Bo Han, Defu Lian

TL;DR

The paper addresses boundary fuzziness in machine-generated text detection by treating hard labels as inexact due to human–machine interplay and detector superintelligence. It introduces an easy-to-hard supervision framework that uses longer-text supervision to guide a more challenging MGT detector, supported by theoretical bounds linking supervisor performance to detector outcomes. Empirically, the method improves robustness and cross-domain/generalization across diverse datasets and attacks, and outperforms standard Knowledge Distillation. The work offers a scalable, efficient approach to closer alignment with underlying golden labels, with clear implications for improving reliability in MGT detection systems.

Abstract

Existing machine-generated text (MGT) detection methods implicitly assume labels as the "golden standard". However, we reveal boundary ambiguity in MGT detection, implying that traditional training paradigms are inexact. Moreover, limitations of human cognition and the superintelligence of detectors make inexact learning widespread and inevitable. To this end, we propose an easy-to-hard enhancement framework to provide reliable supervision under such inexact conditions. Distinct from knowledge distillation, our framework employs an easy supervisor targeting relatively simple longer-text detection tasks (despite weaker capabilities), to enhance the more challenging target detector. Firstly, longer texts targeted by supervisors theoretically alleviate the impact of inexact labels, laying the foundation for reliable supervision. Secondly, by structurally incorporating the detector into the supervisor, we theoretically model the supervisor as a lower performance bound for the detector. Thus, optimizing the supervisor indirectly optimizes the detector, ultimately approximating the underlying "golden" labels. Extensive experiments across diverse practical scenarios, including cross-LLM, cross-domain, mixed text, and paraphrase attacks, demonstrate the framework's significant detection effectiveness. The code is available at: https://github.com/tmlr-group/Easy2Hard.

Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective

TL;DR

The paper addresses boundary fuzziness in machine-generated text detection by treating hard labels as inexact due to human–machine interplay and detector superintelligence. It introduces an easy-to-hard supervision framework that uses longer-text supervision to guide a more challenging MGT detector, supported by theoretical bounds linking supervisor performance to detector outcomes. Empirically, the method improves robustness and cross-domain/generalization across diverse datasets and attacks, and outperforms standard Knowledge Distillation. The work offers a scalable, efficient approach to closer alignment with underlying golden labels, with clear implications for improving reliability in MGT detection systems.

Abstract

Existing machine-generated text (MGT) detection methods implicitly assume labels as the "golden standard". However, we reveal boundary ambiguity in MGT detection, implying that traditional training paradigms are inexact. Moreover, limitations of human cognition and the superintelligence of detectors make inexact learning widespread and inevitable. To this end, we propose an easy-to-hard enhancement framework to provide reliable supervision under such inexact conditions. Distinct from knowledge distillation, our framework employs an easy supervisor targeting relatively simple longer-text detection tasks (despite weaker capabilities), to enhance the more challenging target detector. Firstly, longer texts targeted by supervisors theoretically alleviate the impact of inexact labels, laying the foundation for reliable supervision. Secondly, by structurally incorporating the detector into the supervisor, we theoretically model the supervisor as a lower performance bound for the detector. Thus, optimizing the supervisor indirectly optimizes the detector, ultimately approximating the underlying "golden" labels. Extensive experiments across diverse practical scenarios, including cross-LLM, cross-domain, mixed text, and paraphrase attacks, demonstrate the framework's significant detection effectiveness. The code is available at: https://github.com/tmlr-group/Easy2Hard.

Paper Structure

This paper contains 43 sections, 9 theorems, 26 equations, 24 figures, 36 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $h(s)$ and $m(s)$ be the distributions for human-generated and machine-generated sequences on $s\in\mathcal{S}$, respectively, with the total variation distance $TV(m,h)=\delta>0$. For the text contains $n$ sequences, let $\alpha\ge 0$ denote the ratio of human-like component incorporated in MGT

Figures (24)

  • Figure 1: Boundary fuzziness evaluation between (mixed) MGT and HGT, which illustrates the latent space distribution and prediction confidence distribution under pure (Sub-Fig. 1 & 2) and mixed (Sub-Fig. 3 & 4) texts. The mixed text is obtained by replacing 1/4 of MGTs with HGTs.
  • Figure 2: Performance comparison with and without using soft labels in mixed text (1/4 of MGT was replaced with HGT). The detector is ChatGPT-D guo2023close.
  • Figure 3: The easy-to-hard supervision framework, which uses a carefully designed supervisor, focused on relatively simple task of longer-text detection, to guide a more challenging target detector.
  • Figure 4: Test performance (TPR@FPR-1%) under various LLM mixed texts. Detectors are trained on text generated by PaLM. For each sub-figure, the left group: detectors are trained on mixed text, and the right group: detectors are trained on original text.
  • Figure 5: Robustness (TPR@FPR-1%) against paraphrasing attacks (Back Translation and Polish). Detectors are trained on the PaLM texts and tested on the paraphrasing texts of various LLMs.
  • ...and 19 more figures

Theorems & Definitions (13)

  • Theorem 3.1: Distribution Difference for Longer Text
  • Theorem 3.2: Detection Power for Longer Text
  • Theorem 3.3: Distribution Difference after HGT Distribution Collapse
  • Theorem 3.4: The Effectiveness of the Proposed Framework
  • Theorem
  • proof
  • Lemma C.1: Coupling Lemma aldous1983random
  • Theorem
  • proof
  • Theorem
  • ...and 3 more