Table of Contents
Fetching ...

Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights

Zijie Zeng, Shiqi Liu, Lele Sha, Zhuang Li, Kaixun Yang, Sannyuya Liu, Dragan Gašević, Guanliang Chen

TL;DR

This paper tackles sentence-level AI-generated text detection within human-AI collaborative hybrid texts, arguing that realistic, multi-turn hybrids require finer-grained analysis than document-level approaches. It adopts a two-step pipeline (segment detection followed by segment classification) and leverages the CoAuthor dataset to compare segmentation-based strategies against naive sentence-by-sentence methods. Key findings show detection is inherently challenging due to human edits, frequent author changes, and short segments, yet a high-quality segment detector plus classifier can outperform joint models, especially for longer segments. The study provides practical guidance on choosing strategies based on average segment length and highlights future directions such as predicting segment length to guide method selection.

Abstract

This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments.

Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights

TL;DR

This paper tackles sentence-level AI-generated text detection within human-AI collaborative hybrid texts, arguing that realistic, multi-turn hybrids require finer-grained analysis than document-level approaches. It adopts a two-step pipeline (segment detection followed by segment classification) and leverages the CoAuthor dataset to compare segmentation-based strategies against naive sentence-by-sentence methods. Key findings show detection is inherently challenging due to human edits, frequent author changes, and short segments, yet a high-quality segment detector plus classifier can outperform joint models, especially for longer segments. The study provides practical guidance on choosing strategies based on average segment length and highlights future directions such as predicting segment length to guide method selection.

Abstract

This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments.
Paper Structure (8 sections, 1 figure, 2 tables)

This paper contains 8 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: This example illustrates the phenomenon that a realistic (imperfect) segment detector is more likely to produce authorship-inconsistent segments (e.g., segments $e$ and $f$) in hybrid texts with a higher number of boundaries than in those with fewer boundaries. Statistics related to this phenomenon can be seen in Table \ref{['tb:tb4']}.