Evaluating Copyright Takedown Methods for Language Models

Boyi Wei; Weijia Shi; Yangsibo Huang; Noah A. Smith; Chiyuan Zhang; Luke Zettlemoyer; Kai Li; Peter Henderson

Evaluating Copyright Takedown Methods for Language Models

Boyi Wei, Weijia Shi, Yangsibo Huang, Noah A. Smith, Chiyuan Zhang, Luke Zettlemoyer, Kai Li, Peter Henderson

TL;DR

This work formalizes copyright takedown for language models and introduces CoTaEval, a benchmark to assess whether takedown methods prevent generation of blocklisted content while preserving uncopyrightable factual knowledge and maintaining efficiency. It catalogs a taxonomy of interventions (generic prevention, decoding-time, and unlearning) and evaluates eight methods across memorization and retrieval-augmented settings using books and news domains. Across model families, no method achieves optimal performance on all metrics, revealing trade-offs between reducing similarity to copyrighted content and preserving factual utility and efficiency. The findings highlight the need for further methodological development and careful consideration of deployment policies for copyright-sensitive outputs.

Abstract

Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material. These models can memorize and generate content similar to their training data, posing potential concerns. Therefore, model creators are motivated to develop mitigation methods that prevent generating protected content. We term this procedure as copyright takedowns for LMs, noting the conceptual similarity to (but legal distinction from) the DMCA takedown This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs. We propose CoTaEval, an evaluation framework to assess the effectiveness of copyright takedown methods, the impact on the model's ability to retain uncopyrightable factual knowledge from the training data whose recitation is embargoed, and how well the model maintains its general utility and efficiency. We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches. Our findings indicate that no tested method excels across all metrics, showing significant room for research in this unique problem setting and indicating potential unresolved challenges for live policy proposals.

Evaluating Copyright Takedown Methods for Language Models

TL;DR

Abstract

Paper Structure (47 sections, 4 equations, 14 figures, 15 tables)

This paper contains 47 sections, 4 equations, 14 figures, 15 tables.

Introduction
Copyright and Language Models
Causes to Regurgitation of Copyrighted Contents
Takedown Methods for Language Models
Generic Prevention Strategies
Decoding-Time Takedowns
Training-based Takedowns (Unlearning)
The CoTaEval Evaluation Pipeline
Evaluation Corpus and Target Scenarios
Metrics
Risk Evaluation
Utility and Efficiency Evaluation
Experiments
Experiment Setup
Results and Observations
...and 32 more sections

Figures (14)

Figure 1: Effective takedown methods should prevent models from generating text matching the blocklisted content (low similarity) while preserving uncopyrightable facts and fair use information (high utility).
Figure 2: CoTaEval investigates three scenarios of undesirable regurgitation motivated from copyright concerns: (a) exact match, (b) near-duplicate match, and (c) generation of text semantically similar. Verbatim matching sequences are highlighted in green, and semantic similar sequences are highlighted in yellow.
Figure 3: Violin Plot for RAG scenario
Figure 4: Violin Plot for Memorization scenario
Figure 6: Violin Plot for Llama2-7B-chat model
...and 9 more figures

Evaluating Copyright Takedown Methods for Language Models

TL;DR

Abstract

Evaluating Copyright Takedown Methods for Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (14)