Table of Contents
Fetching ...

Origin Tracing and Detecting of LLMs

Linyang Li, Pengyu Wang, Ke Ren, Tianxiang Sun, Xipeng Qiu

TL;DR

The paper addresses tracing the origins of AI-generated text in an era of evolving LLMs. It introduces Sniffer, a model-wise feature-based method that uses contrastive perplexity features across multiple open-source LLMs and a simple linear classifier to identify text origins, working in both white-box and black-box settings with limited data. Experiments on the SnifferBench benchmark show strong tracing for several known origins and some generalization to unknown origins, though stronger models pose challenges; incorporating GPT-3 logits markedly improves tracing for GPT-3-family outputs. The work provides a practical toolkit and benchmark for AI-origin tracing, highlighting implications for safety, model provenance, and ethical considerations in LLM deployment.

Abstract

The extraordinary performance of large language models (LLMs) heightens the importance of detecting whether the context is generated by an AI system. More importantly, while more and more companies and institutions release their LLMs, the origin can be hard to trace. Since LLMs are heading towards the time of AGI, similar to the origin tracing in anthropology, it is of great importance to trace the origin of LLMs. In this paper, we first raise the concern of the origin tracing of LLMs and propose an effective method to trace and detect AI-generated contexts. We introduce a novel algorithm that leverages the contrastive features between LLMs and extracts model-wise features to trace the text origins. Our proposed method works under both white-box and black-box settings therefore can be widely generalized to detect various LLMs.(e.g. can be generalized to detect GPT-3 models without the GPT-3 models). Also, our proposed method requires only limited data compared with the supervised learning methods and can be extended to trace new-coming model origins. We construct extensive experiments to examine whether we can trace the origins of given texts. We provide valuable observations based on the experimental results, such as the difficulty level of AI origin tracing, and the AI origin similarities, and call for ethical concerns of LLM providers. We are releasing all codes and data as a toolkit and benchmark for future AI origin tracing and detecting studies. \footnote{We are releasing all available resource at \url{https://github.com/OpenLMLab/}.}

Origin Tracing and Detecting of LLMs

TL;DR

The paper addresses tracing the origins of AI-generated text in an era of evolving LLMs. It introduces Sniffer, a model-wise feature-based method that uses contrastive perplexity features across multiple open-source LLMs and a simple linear classifier to identify text origins, working in both white-box and black-box settings with limited data. Experiments on the SnifferBench benchmark show strong tracing for several known origins and some generalization to unknown origins, though stronger models pose challenges; incorporating GPT-3 logits markedly improves tracing for GPT-3-family outputs. The work provides a practical toolkit and benchmark for AI-origin tracing, highlighting implications for safety, model provenance, and ethical considerations in LLM deployment.

Abstract

The extraordinary performance of large language models (LLMs) heightens the importance of detecting whether the context is generated by an AI system. More importantly, while more and more companies and institutions release their LLMs, the origin can be hard to trace. Since LLMs are heading towards the time of AGI, similar to the origin tracing in anthropology, it is of great importance to trace the origin of LLMs. In this paper, we first raise the concern of the origin tracing of LLMs and propose an effective method to trace and detect AI-generated contexts. We introduce a novel algorithm that leverages the contrastive features between LLMs and extracts model-wise features to trace the text origins. Our proposed method works under both white-box and black-box settings therefore can be widely generalized to detect various LLMs.(e.g. can be generalized to detect GPT-3 models without the GPT-3 models). Also, our proposed method requires only limited data compared with the supervised learning methods and can be extended to trace new-coming model origins. We construct extensive experiments to examine whether we can trace the origins of given texts. We provide valuable observations based on the experimental results, such as the difficulty level of AI origin tracing, and the AI origin similarities, and call for ethical concerns of LLM providers. We are releasing all codes and data as a toolkit and benchmark for future AI origin tracing and detecting studies. \footnote{We are releasing all available resource at \url{https://github.com/OpenLMLab/}.}
Paper Structure (14 sections, 6 figures, 8 tables)

This paper contains 14 sections, 6 figures, 8 tables.

Figures (6)

  • Figure 1: A knowledge flow of LLMs; with origin tracing, we can trace Alpaca back to ChatGPT and LLaMA.
  • Figure 2: Sniffer process Illustration
  • Figure 3: The discrepancy between different text origins in different baseline methods. In each figure, different bars show different text origins and each figure is to use a certain model of a certain detect method to test given texts.
  • Figure 4: Comparison with Supervised Learning Methods that utilize semantic-wise features.
  • Figure 5: Tracing different types of generated texts: (a) plots the ChatGPT tracing results that use different instructions; (b) plots the tracing results that separate GPT-J and GPT-Neo origins and calculate the f1-score of corresponding origins; (c) plots the tracing results that test Alpaca and Dolly models that use ChatGPT instructions to supervise fine-tuning the LLaMA/GPT-J models to build an instructed LLM.
  • ...and 1 more figures