Table of Contents
Fetching ...

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios

Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xinyi Yang, Yulin Yuan, Lidia S. Chao

TL;DR

DetectRL introduces a real-world oriented benchmark for LLM-generated text detection by combining human-written samples from high-risk domains with samples generated by widely used LLMs under realistic attack settings. The framework evaluates detectors via four tasks—robustness, generalization, text-length effects, and real-world writing factors—revealing that zero-shot detectors underperform compared with supervised models, especially under adversarial perturbations and data-mixing scenarios. Key findings show that domain style, model-specific statistical patterns, and attack methods significantly influence detector performance, while shorter training data benefits zero-shot detectors and longer test data improves their performance; supervised detectors demonstrate robust performance across conditions. DetectRL also provides a scalable data-augmentation and attack framework to evolve benchmarks as LLM capabilities advance, aiming to drive the development of more robust, real-world detectors.

Abstract

Detecting text generated by large language models (LLMs) is of great recent interest. With zero-shot methods like DetectGPT, detection capabilities have reached impressive levels. However, the reliability of existing detectors in real-world applications remains underexplored. In this study, we present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task. We collected human-written datasets from domains where LLMs are particularly prone to misuse. Using popular LLMs, we generated data that better aligns with real-world applications. Unlike previous studies, we employed heuristic rules to create adversarial LLM-generated text, simulating various prompts usages, human revisions like word substitutions, and writing noises like spelling mistakes. Our development of DetectRL reveals the strengths and limitations of current SOTA detectors. More importantly, we analyzed the potential impact of writing styles, model types, attack methods, the text lengths, and real-world human writing factors on different types of detectors. We believe DetectRL could serve as an effective benchmark for assessing detectors in real-world scenarios, evolving with advanced attack methods, thus providing more stressful evaluation to drive the development of more efficient detectors. Data and code are publicly available at: https://github.com/NLP2CT/DetectRL.

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios

TL;DR

DetectRL introduces a real-world oriented benchmark for LLM-generated text detection by combining human-written samples from high-risk domains with samples generated by widely used LLMs under realistic attack settings. The framework evaluates detectors via four tasks—robustness, generalization, text-length effects, and real-world writing factors—revealing that zero-shot detectors underperform compared with supervised models, especially under adversarial perturbations and data-mixing scenarios. Key findings show that domain style, model-specific statistical patterns, and attack methods significantly influence detector performance, while shorter training data benefits zero-shot detectors and longer test data improves their performance; supervised detectors demonstrate robust performance across conditions. DetectRL also provides a scalable data-augmentation and attack framework to evolve benchmarks as LLM capabilities advance, aiming to drive the development of more robust, real-world detectors.

Abstract

Detecting text generated by large language models (LLMs) is of great recent interest. With zero-shot methods like DetectGPT, detection capabilities have reached impressive levels. However, the reliability of existing detectors in real-world applications remains underexplored. In this study, we present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task. We collected human-written datasets from domains where LLMs are particularly prone to misuse. Using popular LLMs, we generated data that better aligns with real-world applications. Unlike previous studies, we employed heuristic rules to create adversarial LLM-generated text, simulating various prompts usages, human revisions like word substitutions, and writing noises like spelling mistakes. Our development of DetectRL reveals the strengths and limitations of current SOTA detectors. More importantly, we analyzed the potential impact of writing styles, model types, attack methods, the text lengths, and real-world human writing factors on different types of detectors. We believe DetectRL could serve as an effective benchmark for assessing detectors in real-world scenarios, evolving with advanced attack methods, thus providing more stressful evaluation to drive the development of more efficient detectors. Data and code are publicly available at: https://github.com/NLP2CT/DetectRL.

Paper Structure

This paper contains 82 sections, 2 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: The overall framework of DetectRL. Human-written samples are collected from high-risk and abuse-prone domains. We employ widely-used and powerful LLMs to create LLM-generated samples. All samples undergo well-designed attacks to simulate real-world scenarios and a varying length augmentation method is applied to enhance the benchmark's diversity. DetectRL consists of four distinct tasks to evaluate the detectors' comprehensive detection abilities and robustness.
  • Figure 2: Benchmark statistics.
  • Figure 3: Impact of text length on AUROC during training-time and test-time.
  • Figure 4: Text length distribution of DetectRL.
  • Figure 5: N-gram distribution of DetectRL.
  • ...and 5 more figures