Table of Contents
Fetching ...

Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs

Zae Myung Kim, Kwang Hee Lee, Preston Zhu, Vipul Raheja, Dongyeop Kang

TL;DR

The paper tackles the challenge of distinguishing human-written from machine-generated texts by leveraging hierarchical discourse patterns. It models documents as recursive hypergraphs derived from RST trees and analyzes discourse motifs (unions of triads) using MF, WAD, and MF-IDF metrics to capture deeper structure beyond surface text. Empirical results show that incorporating hierarchical discourse motifs improves authorship detection across multiple baselines and datasets, including out-of-domain and paraphrased samples, with human writing exhibiting greater structural variability. The approach enables robust long-form text analysis and offers insights into how discourse structure differentiates human vs machine-generated content, with practical implications for detectors and for studying domain-specific writing patterns. The work also introduces TenPageStories to study long-form generation and demonstrates that motifs contribute meaningful, interpretable signals aligned with discourse structure.

Abstract

With the advent of large language models (LLM), the line between human-crafted and machine-generated texts has become increasingly blurred. This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans, particularly uncovering the underlying discourse structures of texts beyond their surface structures. Introducing a novel methodology, we leverage hierarchical parse trees and recursive hypergraphs to unveil distinctive discourse patterns in texts produced by both LLMs and humans. Empirical findings demonstrate that, although both LLMs and humans generate distinct discourse patterns influenced by specific domains, human-written texts exhibit more structural variability, reflecting the nuanced nature of human writing in different domains. Notably, incorporating hierarchical discourse features enhances binary classifiers' overall performance in distinguishing between human-written and machine-generated texts, even on out-of-distribution and paraphrased samples. This underscores the significance of incorporating hierarchical discourse features in the analysis of text patterns. The code and dataset are available at https://github.com/minnesotanlp/threads-of-subtlety.

Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs

TL;DR

The paper tackles the challenge of distinguishing human-written from machine-generated texts by leveraging hierarchical discourse patterns. It models documents as recursive hypergraphs derived from RST trees and analyzes discourse motifs (unions of triads) using MF, WAD, and MF-IDF metrics to capture deeper structure beyond surface text. Empirical results show that incorporating hierarchical discourse motifs improves authorship detection across multiple baselines and datasets, including out-of-domain and paraphrased samples, with human writing exhibiting greater structural variability. The approach enables robust long-form text analysis and offers insights into how discourse structure differentiates human vs machine-generated content, with practical implications for detectors and for studying domain-specific writing patterns. The work also introduces TenPageStories to study long-form generation and demonstrates that motifs contribute meaningful, interpretable signals aligned with discourse structure.

Abstract

With the advent of large language models (LLM), the line between human-crafted and machine-generated texts has become increasingly blurred. This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans, particularly uncovering the underlying discourse structures of texts beyond their surface structures. Introducing a novel methodology, we leverage hierarchical parse trees and recursive hypergraphs to unveil distinctive discourse patterns in texts produced by both LLMs and humans. Empirical findings demonstrate that, although both LLMs and humans generate distinct discourse patterns influenced by specific domains, human-written texts exhibit more structural variability, reflecting the nuanced nature of human writing in different domains. Notably, incorporating hierarchical discourse features enhances binary classifiers' overall performance in distinguishing between human-written and machine-generated texts, even on out-of-distribution and paraphrased samples. This underscores the significance of incorporating hierarchical discourse features in the analysis of text patterns. The code and dataset are available at https://github.com/minnesotanlp/threads-of-subtlety.
Paper Structure (41 sections, 1 theorem, 8 equations, 18 figures, 6 tables, 3 algorithms)

This paper contains 41 sections, 1 theorem, 8 equations, 18 figures, 6 tables, 3 algorithms.

Key Result

Theorem 1

The transformed graph of an RST tree only consists of the union of triangle graphs.

Figures (18)

  • Figure 1: Human writers often employ hierarchical linguistic structures in writing whereas LLMs primarily operate by the sequential next token prediction task.
  • Figure 2: Difference in motif distribution of machine-generated and human-written texts for Yelp domain. Below are the top discourse motifs for each authorship and their corresponding discourse relations with examples. Text enclosed in angle brackets denotes the EDUs involved in the relation.
  • Figure 3: A quote from Steve Jobs and its RST tree converted into a hypergraph form. A hexagonal node represents the "nucleus" node, while a circular one denotes the "satellite" node. Each node is labeled with a span of EDU indices that it covers. The star-shaped node is the root node of the graph, encompassing all subgraphs and EDUs.
  • Figure 4: A data flow diagram illustrating how various types of input features are fed into the three baseline models: Graph Attention Network (GAT), Random Forest (RF), and Longformer (LF).
  • Figure 5: Difference distribution of motifs for TenPageStories under the fill-in-the-gap settings. The x-axis represents unique indices of single-triads while the y-axis shows the difference in motif frequency (scaled by 1e-3) of machine-generated and human-written texts for each motif. Discourse motifs indexed at 0 (Elaboration), 5 (Joint), 7 (Joint), and 28 (Temporal) seem to be useful in distinguishing the two groups.
  • ...and 13 more figures

Theorems & Definitions (2)

  • Definition 1
  • Theorem 1