Table of Contents
Fetching ...

AI-generated text boundary detection with RoFT

Laida Kushnareva, Tatiana Gaintseva, German Magai, Serguei Barannikov, Dmitry Abulkhanov, Kristian Kuznetsov, Eduard Tulchinskii, Irina Piontkovskaya, Sergey Nikolenko

TL;DR

This work finds that perplexity-based approaches to boundary detection tend to be more robust to peculiarities of domain-specific data than supervised fine-tuning of the RoBERTa model; it also finds which features of the text confuse boundary detection algorithms and negatively influence their performance in cross-domain settings.

Abstract

Due to the rapid development of large language models, people increasingly often encounter texts that may start as written by a human but continue as machine-generated. Detecting the boundary between human-written and machine-generated parts of such texts is a challenging problem that has not received much attention in literature. We attempt to bridge this gap and examine several ways to adapt state of the art artificial text detection classifiers to the boundary detection setting. We push all detectors to their limits, using the Real or Fake text benchmark that contains short texts on several topics and includes generations of various language models. We use this diversity to deeply examine the robustness of all detectors in cross-domain and cross-model settings to provide baselines and insights for future research. In particular, we find that perplexity-based approaches to boundary detection tend to be more robust to peculiarities of domain-specific data than supervised fine-tuning of the RoBERTa model; we also find which features of the text confuse boundary detection algorithms and negatively influence their performance in cross-domain settings.

AI-generated text boundary detection with RoFT

TL;DR

This work finds that perplexity-based approaches to boundary detection tend to be more robust to peculiarities of domain-specific data than supervised fine-tuning of the RoBERTa model; it also finds which features of the text confuse boundary detection algorithms and negatively influence their performance in cross-domain settings.

Abstract

Due to the rapid development of large language models, people increasingly often encounter texts that may start as written by a human but continue as machine-generated. Detecting the boundary between human-written and machine-generated parts of such texts is a challenging problem that has not received much attention in literature. We attempt to bridge this gap and examine several ways to adapt state of the art artificial text detection classifiers to the boundary detection setting. We push all detectors to their limits, using the Real or Fake text benchmark that contains short texts on several topics and includes generations of various language models. We use this diversity to deeply examine the robustness of all detectors in cross-domain and cross-model settings to provide baselines and insights for future research. In particular, we find that perplexity-based approaches to boundary detection tend to be more robust to peculiarities of domain-specific data than supervised fine-tuning of the RoBERTa model; we also find which features of the text confuse boundary detection algorithms and negatively influence their performance in cross-domain settings.
Paper Structure (13 sections, 6 equations, 11 figures, 7 tables)

This paper contains 13 sections, 6 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Sample input from the ROFT-chatgpt dataset. The rightmost part of the picture shows the true bondary between human-written and machine-generated text. Other three parts are colored according to the features, extracted by various classifiers. Here, gpt3.5 predicts that only two last sentences are generated; TLE + TS binary predicts that generation starts from the second sentence; phi1.5 perplexity predicts that generation starts from the third sentence which is correct answer. See Section \ref{['sec:method']} for detailed explanation of all the methods and Appendix \ref{['sec:misclassified_examples']} for more examples from the dataset.
  • Figure 2: Sentence length distributions in RoBERTA tokens, original RoFT, by model
  • Figure 4: Multilabel time series classification: (a) estimating the intrinsic dimension of token embeddings in a sliding window, (b) a sample resulting series: green -- human-written, orange -- machine-generated tokens.
  • Figure 5: Sentence length distributions in RoBERTA tokens, original RoFT, by topic
  • Figure 7: Distribution of pretrained (but not fine-tuned) RoBERTa [CLS] embeddings of real and fake parts of text samples from the original RoFT and RoFT-chatgpt datasets. The dimension is reduced to 2D via principal component analysis doi:10.1080/14786440109462720.
  • ...and 6 more figures