Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs
Zae Myung Kim, Kwang Hee Lee, Preston Zhu, Vipul Raheja, Dongyeop Kang
TL;DR
The paper tackles the challenge of distinguishing human-written from machine-generated texts by leveraging hierarchical discourse patterns. It models documents as recursive hypergraphs derived from RST trees and analyzes discourse motifs (unions of triads) using MF, WAD, and MF-IDF metrics to capture deeper structure beyond surface text. Empirical results show that incorporating hierarchical discourse motifs improves authorship detection across multiple baselines and datasets, including out-of-domain and paraphrased samples, with human writing exhibiting greater structural variability. The approach enables robust long-form text analysis and offers insights into how discourse structure differentiates human vs machine-generated content, with practical implications for detectors and for studying domain-specific writing patterns. The work also introduces TenPageStories to study long-form generation and demonstrates that motifs contribute meaningful, interpretable signals aligned with discourse structure.
Abstract
With the advent of large language models (LLM), the line between human-crafted and machine-generated texts has become increasingly blurred. This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans, particularly uncovering the underlying discourse structures of texts beyond their surface structures. Introducing a novel methodology, we leverage hierarchical parse trees and recursive hypergraphs to unveil distinctive discourse patterns in texts produced by both LLMs and humans. Empirical findings demonstrate that, although both LLMs and humans generate distinct discourse patterns influenced by specific domains, human-written texts exhibit more structural variability, reflecting the nuanced nature of human writing in different domains. Notably, incorporating hierarchical discourse features enhances binary classifiers' overall performance in distinguishing between human-written and machine-generated texts, even on out-of-distribution and paraphrased samples. This underscores the significance of incorporating hierarchical discourse features in the analysis of text patterns. The code and dataset are available at https://github.com/minnesotanlp/threads-of-subtlety.
