Table of Contents
Fetching ...

DE-COP: Detecting Copyrighted Content in Language Models Training Data

André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li

TL;DR

DE-COP reframes copyright-content detection in LLM training as a paraphrase-based MCQA task that works in both open and black-box settings. It introduces BookTection and arXivTection benchmarks to evaluate memorization signals across domains, and demonstrates that DE-COP outperforms baselines by about 9.6% in AUC and achieves strong accuracy on black-box models. A logit calibration step mitigates selection bias, and larger models and paraphrase provenance influence performance, with human evaluators showing limited ability to replicate the results. The work provides a foundation for accountability in training data provenance and offers practical tools for auditing model content usage, while noting limitations related to passage length, context size, and the interpretability of paraphrase quality.

Abstract

How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. DE-COP's core approach is to probe an LLM with multiple-choice questions, whose options include both verbatim text and their paraphrases. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model's training cutoff, along with their paraphrases. Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give approximately 4% accuracy. The code and datasets are available at https://github.com/LeiLiLab/DE-COP.

DE-COP: Detecting Copyrighted Content in Language Models Training Data

TL;DR

DE-COP reframes copyright-content detection in LLM training as a paraphrase-based MCQA task that works in both open and black-box settings. It introduces BookTection and arXivTection benchmarks to evaluate memorization signals across domains, and demonstrates that DE-COP outperforms baselines by about 9.6% in AUC and achieves strong accuracy on black-box models. A logit calibration step mitigates selection bias, and larger models and paraphrase provenance influence performance, with human evaluators showing limited ability to replicate the results. The work provides a foundation for accountability in training data provenance and offers practical tools for auditing model content usage, while noting limitations related to passage length, context size, and the interpretability of paraphrase quality.

Abstract

How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. DE-COP's core approach is to probe an LLM with multiple-choice questions, whose options include both verbatim text and their paraphrases. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model's training cutoff, along with their paraphrases. Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give approximately 4% accuracy. The code and datasets are available at https://github.com/LeiLiLab/DE-COP.
Paper Structure (35 sections, 9 figures, 14 tables, 1 algorithm)

This paper contains 35 sections, 9 figures, 14 tables, 1 algorithm.

Figures (9)

  • Figure 1: Our DE-COP identifies copyrighted books within ChatGPT training data. We detect that a specific book was seen during training by showing that the LLMs performance on the task of identifying book verbatim is significantly higher on a "suspect" book than on a recent one (published 2023 onward).
  • Figure 2: DE-COP involves a three-step process. First, we create a dataset by extracting passages from various books and paraphrasing them three times using Claude 2. Then, the target LLM is presented with the original passage alongside its three paraphrases. The model's task is to correctly identify the verbatim from the multiple choice options, a process we test on a selection of "clean" books to establish an average baseline performance. Finally, to determine if a particular book is included in a model's training data, we compare its performance on this task against the baseline. If the model shows significantly higher accuracy, it suggests that the book was in the training data.
  • Figure 3: Calibration Approach. We compare the expected average token probability on a small set of unseen books with the empirically observed. We then compute the prior adjustment needed for the option tokens before determining the most probable label.
  • Figure 4: AUC performance across different model sizes.
  • Figure 5: AUC performance across different passage lengths.
  • ...and 4 more figures