Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection
Shantanu Thorat, Tianbao Yang
TL;DR
The paper investigates which LLMs are most difficult to detect across writing domains by training LibAUC-optimized classifiers on two datasets (Deepfake Text and RIP) and evaluating cross-domain and cross-LLM generalization. Using DistilRoBERTa as the backbone, the study reveals domain-specific variation in detection difficulty, with scientific writing being notably challenging and OpenAI-generated texts posing substantial detection hurdles unless detectors are trained on OpenAI data. Analyses of entropy and OOV ratios point to OpenAI texts' closer resemblance to human writing, suggesting linguistic factors beyond surface cues contribute to detectability. The findings highlight the importance of diverse, cross-domain training data for robust AI-text detectors and provide a foundation for future work on feature-driven detection strategies.
Abstract
As LLMs increase in accessibility, LLM-generated texts have proliferated across several fields, such as scientific, academic, and creative writing. However, LLMs are not created equally; they may have different architectures and training datasets. Thus, some LLMs may be more challenging to detect than others. Using two datasets spanning four total writing domains, we train AI-generated (AIG) text classifiers using the LibAUC library - a deep learning library for training classifiers with imbalanced datasets. Our results in the Deepfake Text dataset show that AIG-text detection varies across domains, with scientific writing being relatively challenging. In the Rewritten Ivy Panda (RIP) dataset focusing on student essays, we find that the OpenAI family of LLMs was substantially difficult for our classifiers to distinguish from human texts. Additionally, we explore possible factors that could explain the difficulties in detecting OpenAI-generated texts.
