Table of Contents
Fetching ...

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

Kathleen C. Fraser, Hillary Dawkins, Svetlana Kiritchenko

TL;DR

The paper surveys the landscape of AI-generated text detection, detailing watermarking, statistical/stylistic analyses, and LM-based classifiers while evaluating their strengths, weaknesses, and applicability across detection scenarios. It highlights the crucial role of dataset domain, language, and model characteristics in detector performance, and emphasizes the fragility of detectors to adversarial attacks and out-of-distribution data. The authors advocate for ensemble approaches, domain-aware training data, and human-in-the-loop strategies, while calling for multilingual, fair, and transparent detection frameworks. The work underscores the societal importance of robust AIGT detection amid rapidly evolving LLM capabilities and regulatory considerations, and outlines practical guidance for researchers and practitioners. It also identifies key gaps, such as cross-lingual generalization, unseen models, and multimodal detection, as fertile ground for future research.

Abstract

Large language models (LLMs) have advanced to a point that even humans have difficulty discerning whether a text was generated by another human, or by a computer. However, knowing whether a text was produced by human or artificial intelligence (AI) is important to determining its trustworthiness, and has applications in many domains including detecting fraud and academic dishonesty, as well as combating the spread of misinformation and political propaganda. The task of AI-generated text (AIGT) detection is therefore both very challenging, and highly critical. In this survey, we summarize state-of-the art approaches to AIGT detection, including watermarking, statistical and stylistic analysis, and machine learning classification. We also provide information about existing datasets for this task. Synthesizing the research findings, we aim to provide insight into the salient factors that combine to determine how "detectable" AIGT text is under different scenarios, and to make practical recommendations for future work towards this significant technical and societal challenge.

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

TL;DR

The paper surveys the landscape of AI-generated text detection, detailing watermarking, statistical/stylistic analyses, and LM-based classifiers while evaluating their strengths, weaknesses, and applicability across detection scenarios. It highlights the crucial role of dataset domain, language, and model characteristics in detector performance, and emphasizes the fragility of detectors to adversarial attacks and out-of-distribution data. The authors advocate for ensemble approaches, domain-aware training data, and human-in-the-loop strategies, while calling for multilingual, fair, and transparent detection frameworks. The work underscores the societal importance of robust AIGT detection amid rapidly evolving LLM capabilities and regulatory considerations, and outlines practical guidance for researchers and practitioners. It also identifies key gaps, such as cross-lingual generalization, unseen models, and multimodal detection, as fertile ground for future research.

Abstract

Large language models (LLMs) have advanced to a point that even humans have difficulty discerning whether a text was generated by another human, or by a computer. However, knowing whether a text was produced by human or artificial intelligence (AI) is important to determining its trustworthiness, and has applications in many domains including detecting fraud and academic dishonesty, as well as combating the spread of misinformation and political propaganda. The task of AI-generated text (AIGT) detection is therefore both very challenging, and highly critical. In this survey, we summarize state-of-the art approaches to AIGT detection, including watermarking, statistical and stylistic analysis, and machine learning classification. We also provide information about existing datasets for this task. Synthesizing the research findings, we aim to provide insight into the salient factors that combine to determine how "detectable" AIGT text is under different scenarios, and to make practical recommendations for future work towards this significant technical and societal challenge.
Paper Structure (24 sections, 5 figures, 3 tables)

This paper contains 24 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Some common types of misuse of AI-generated text ( ? , ? ).
  • Figure 2: The text is generated one word at a time given the input text by sampling the probability distribution over the vocabulary. (a) When sentences are highly predictable, the probability associated with each generated word is high, and the model is certain about the next generation. (b) More typically, even simple sentences have many reasonable continuations at each generation step, and each possibility takes the story in a different direction.
  • Figure 3: Generating text with watermarking using red-green lists ( ? ).
  • Figure 4: AIGT tends to occupy the negative curvature regions of the probability function ( ? ).
  • Figure 5: Detection methods available in different detection scenarios.