Table of Contents
Fetching ...

Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection

Shantanu Thorat, Tianbao Yang

TL;DR

The paper investigates which LLMs are most difficult to detect across writing domains by training LibAUC-optimized classifiers on two datasets (Deepfake Text and RIP) and evaluating cross-domain and cross-LLM generalization. Using DistilRoBERTa as the backbone, the study reveals domain-specific variation in detection difficulty, with scientific writing being notably challenging and OpenAI-generated texts posing substantial detection hurdles unless detectors are trained on OpenAI data. Analyses of entropy and OOV ratios point to OpenAI texts' closer resemblance to human writing, suggesting linguistic factors beyond surface cues contribute to detectability. The findings highlight the importance of diverse, cross-domain training data for robust AI-text detectors and provide a foundation for future work on feature-driven detection strategies.

Abstract

As LLMs increase in accessibility, LLM-generated texts have proliferated across several fields, such as scientific, academic, and creative writing. However, LLMs are not created equally; they may have different architectures and training datasets. Thus, some LLMs may be more challenging to detect than others. Using two datasets spanning four total writing domains, we train AI-generated (AIG) text classifiers using the LibAUC library - a deep learning library for training classifiers with imbalanced datasets. Our results in the Deepfake Text dataset show that AIG-text detection varies across domains, with scientific writing being relatively challenging. In the Rewritten Ivy Panda (RIP) dataset focusing on student essays, we find that the OpenAI family of LLMs was substantially difficult for our classifiers to distinguish from human texts. Additionally, we explore possible factors that could explain the difficulties in detecting OpenAI-generated texts.

Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection

TL;DR

The paper investigates which LLMs are most difficult to detect across writing domains by training LibAUC-optimized classifiers on two datasets (Deepfake Text and RIP) and evaluating cross-domain and cross-LLM generalization. Using DistilRoBERTa as the backbone, the study reveals domain-specific variation in detection difficulty, with scientific writing being notably challenging and OpenAI-generated texts posing substantial detection hurdles unless detectors are trained on OpenAI data. Analyses of entropy and OOV ratios point to OpenAI texts' closer resemblance to human writing, suggesting linguistic factors beyond surface cues contribute to detectability. The findings highlight the importance of diverse, cross-domain training data for robust AI-text detectors and provide a foundation for future work on feature-driven detection strategies.

Abstract

As LLMs increase in accessibility, LLM-generated texts have proliferated across several fields, such as scientific, academic, and creative writing. However, LLMs are not created equally; they may have different architectures and training datasets. Thus, some LLMs may be more challenging to detect than others. Using two datasets spanning four total writing domains, we train AI-generated (AIG) text classifiers using the LibAUC library - a deep learning library for training classifiers with imbalanced datasets. Our results in the Deepfake Text dataset show that AIG-text detection varies across domains, with scientific writing being relatively challenging. In the Rewritten Ivy Panda (RIP) dataset focusing on student essays, we find that the OpenAI family of LLMs was substantially difficult for our classifiers to distinguish from human texts. Additionally, we explore possible factors that could explain the difficulties in detecting OpenAI-generated texts.

Paper Structure

This paper contains 13 sections, 2 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: A framework of training a classifier to detect AIG-texts. We vary the model $A$ from different model families as shown in Table \ref{['tbl:deepfake-llms']}.
  • Figure 2: CMV testing framework for evaluating OpenAI-trained classifiers' performance on OpenAI texts. We have three OpenAI classifiers with three different possible test sets leading to nine AUC scores. The mean AUC score for OpenAI classifiers on OpenAI texts was 0.976. This testing procedure was repeated across all combinations of LLM families.
  • Figure 3: Mean AUC by LLM family on the RIP Bedrock dataset. Mean AUC is computed identically to the Deepfake dataset in Figure \ref{['fig:llm_family_testing_framework']}.
  • Figure 4: Kernel density estimates of the entropy distributions for the four LLM families --- Claude, Llama2, Mistral, and OpenAI --- from the test set. The KDE for entropy in human-authored essays is included as a baseline.
  • Figure 5: The empirical cumulative distribution function plots for the four LLM families --- Claude, Llama2, Mistral, and OpenAI --- from the test set. The ECDF for human-authored essays is included as a baseline.