Table of Contents
Fetching ...

From Text to Source: Results in Detecting Large Language Model-Generated Content

Wissam Antoun, Benoît Sagot, Djamé Seddah

TL;DR

This paper investigates “Cross-Model Detection,” by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training.

Abstract

The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-like text, has raised concerns about misinformation and ethical implications. Addressing these concerns necessitates the development of robust methods to detect and attribute text generated by LLMs. This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The study comprehensively explores various LLM sizes and families, and assesses the impact of conversational fine-tuning techniques, quantization, and watermarking on classifier generalization. The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection. Our results reveal several key findings: a clear inverse relationship between classifier effectiveness and model size, with larger LLMs being more challenging to detect, especially when the classifier is trained on data from smaller models. Training on data from similarly sized LLMs can improve detection performance from larger models but may lead to decreased performance when dealing with smaller models. Additionally, model attribution experiments show promising results in identifying source models and model families, highlighting detectable signatures in LLM-generated text, with particularly remarkable outcomes in watermarking detection, while no detectable signatures of quantization were observed. Overall, our study contributes valuable insights into the interplay of model size, family, and training data in LLM detection and attribution.

From Text to Source: Results in Detecting Large Language Model-Generated Content

TL;DR

This paper investigates “Cross-Model Detection,” by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training.

Abstract

The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-like text, has raised concerns about misinformation and ethical implications. Addressing these concerns necessitates the development of robust methods to detect and attribute text generated by LLMs. This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The study comprehensively explores various LLM sizes and families, and assesses the impact of conversational fine-tuning techniques, quantization, and watermarking on classifier generalization. The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection. Our results reveal several key findings: a clear inverse relationship between classifier effectiveness and model size, with larger LLMs being more challenging to detect, especially when the classifier is trained on data from smaller models. Training on data from similarly sized LLMs can improve detection performance from larger models but may lead to decreased performance when dealing with smaller models. Additionally, model attribution experiments show promising results in identifying source models and model families, highlighting detectable signatures in LLM-generated text, with particularly remarkable outcomes in watermarking detection, while no detectable signatures of quantization were observed. Overall, our study contributes valuable insights into the interplay of model size, family, and training data in LLM detection and attribution.
Paper Structure (31 sections, 9 figures, 1 table)

This paper contains 31 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: 5-seed averaged AUC scores for a classifier trained on text from a source model (Side axis) and tested on text from a target model (Top axis). We provide the full results table with values in Appendix \ref{['appendix:full_table']}.
  • Figure 2: Average target AUC scores vs model size. OLS Trend lines are drawn for each set of model family
  • Figure 3: Normalized confusion matrix for Source Model Identification. 5-seed averaged and normalized by the predicted class support. For clarity purposes, we hide labels on values less than 4.
  • Figure 4: Conversational models cross-model detection with their foundation LLM (5-seed averaged AUCROC score).
  • Figure 5: Quantized models cross-model detection (5-seed averaged AUCROC score).
  • ...and 4 more figures