Table of Contents
Fetching ...

Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection

Yang Li, Qiang Sheng, Zhengjia Wang, Yehan Yang, Danding Wang, Juan Cao

Abstract

The misuse of large language models (LLMs) requires precise detection of synthetic text. Existing works mainly follow binary or ternary classification settings, which can only distinguish pure human/LLM text or collaborative text at best. This remains insufficient for the nuanced regulation, as the LLM-polished human text and humanized LLM text often trigger different policy consequences. In this paper, we explore fine-grained LLM-generated text detection under a rigorous four-class setting. To handle such complexities, we propose RACE (Rhetorical Analysis for Creator-Editor Modeling), a fine-grained detection method that characterizes the distinct signatures of creator and editor. Specifically, RACE utilizes Rhetorical Structure Theory to construct a logic graph for the creator's foundation while extracting Elementary Discourse Unit-level features for the editor's style. Experiments show that RACE outperforms 12 baselines in identifying fine-grained types with low false alarms, offering a policy-aligned solution for LLM regulation.

Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection

Abstract

The misuse of large language models (LLMs) requires precise detection of synthetic text. Existing works mainly follow binary or ternary classification settings, which can only distinguish pure human/LLM text or collaborative text at best. This remains insufficient for the nuanced regulation, as the LLM-polished human text and humanized LLM text often trigger different policy consequences. In this paper, we explore fine-grained LLM-generated text detection under a rigorous four-class setting. To handle such complexities, we propose RACE (Rhetorical Analysis for Creator-Editor Modeling), a fine-grained detection method that characterizes the distinct signatures of creator and editor. Specifically, RACE utilizes Rhetorical Structure Theory to construct a logic graph for the creator's foundation while extracting Elementary Discourse Unit-level features for the editor's style. Experiments show that RACE outperforms 12 baselines in identifying fine-grained types with low false alarms, offering a policy-aligned solution for LLM regulation.

Paper Structure

This paper contains 35 sections, 11 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Illustration of our research scope. (a) A Creator-Editor framework for categorizing different types of texts in fine-grained LLM-generated text detection. (b) Comparison of the existing settings and the complex 4-class setting that we focus on in this paper.
  • Figure 2: Distribution of RST relations. (a) Divergence of Creators: Human creators build deeper rhetorical hierarchies (e.g., Attribution, Background), whereas LLMs produce flatter structures relying on surface-level relations (e.g., Elaboration, Evaluation). (b) LLM-Polished: underlying human architecture persists. (c) Humanized: underlying LLM architecture persists.
  • Figure 3: Overall architecture of RACE. Given a text piece, RACE (a) first captures both creator and editor traces through rhetorical structure construction and elementary discourse unit extraction. (b) These dual traces are then transformed into a logic-aware graph, where both linguistic expression and logical organization signals are encoded into node features via descendant span pooling and relation-aware projection. (c) Next, Rhetoric-Guided Message Passing propagates information through relation-specific aggregation with basis decomposition to capture complex rhetorical dependencies. (d) Finally, the global text representation is obtained via root pooling for classification.
  • Figure 4: Analysis of detection performance of CoCo and our proposed RACE across varying text lengths.