Table of Contents
Fetching ...

Certified Mitigation of Worst-Case LLM Copyright Infringement

Jingyu Zhang, Jiacan Yu, Marc Marone, Benjamin Van Durme, Daniel Khashabi

TL;DR

This work addresses the risk that LLMs may output long verbatim quotes from copyrighted sources by introducing BloomScrub, an inference-time, certified copyright takedown method. It interleaves efficient quote detection via a Bloom filter with guided rewriting to scrub high-risk quotes, repeating until outputs fall below a defined risk threshold or abstaining to certify risk reduction. The approach is scalable, plug-and-play, and adjustable through the risk threshold and rewrite iterations, with abstention providing a hard safety guarantee. Empirical results show BloomScrub outperforms existing baselines in worst-case infringement reduction while preserving information quality and utility, especially on large corpora, and it can certify risk reduction through abstention when necessary.

Abstract

The exposure of large language models (LLMs) to copyrighted material during pre-training raises concerns about unintentional copyright infringement post deployment. This has driven the development of "copyright takedown" methods, post-training approaches aimed at preventing models from generating content substantially similar to copyrighted ones. While current mitigation approaches are somewhat effective for average-case risks, we demonstrate that they overlook worst-case copyright risks exhibits by the existence of long, verbatim quotes from copyrighted sources. We propose BloomScrub, a remarkably simple yet highly effective inference-time approach that provides certified copyright takedown. Our method repeatedly interleaves quote detection with rewriting techniques to transform potentially infringing segments. By leveraging efficient data sketches (Bloom filters), our approach enables scalable copyright screening even for large-scale real-world corpora. When quotes beyond a length threshold cannot be removed, the system can abstain from responding, offering certified risk reduction. Experimental results show that BloomScrub reduces infringement risk, preserves utility, and accommodates different levels of enforcement stringency with adaptive abstention. Our results suggest that lightweight, inference-time methods can be surprisingly effective for copyright prevention.

Certified Mitigation of Worst-Case LLM Copyright Infringement

TL;DR

This work addresses the risk that LLMs may output long verbatim quotes from copyrighted sources by introducing BloomScrub, an inference-time, certified copyright takedown method. It interleaves efficient quote detection via a Bloom filter with guided rewriting to scrub high-risk quotes, repeating until outputs fall below a defined risk threshold or abstaining to certify risk reduction. The approach is scalable, plug-and-play, and adjustable through the risk threshold and rewrite iterations, with abstention providing a hard safety guarantee. Empirical results show BloomScrub outperforms existing baselines in worst-case infringement reduction while preserving information quality and utility, especially on large corpora, and it can certify risk reduction through abstention when necessary.

Abstract

The exposure of large language models (LLMs) to copyrighted material during pre-training raises concerns about unintentional copyright infringement post deployment. This has driven the development of "copyright takedown" methods, post-training approaches aimed at preventing models from generating content substantially similar to copyrighted ones. While current mitigation approaches are somewhat effective for average-case risks, we demonstrate that they overlook worst-case copyright risks exhibits by the existence of long, verbatim quotes from copyrighted sources. We propose BloomScrub, a remarkably simple yet highly effective inference-time approach that provides certified copyright takedown. Our method repeatedly interleaves quote detection with rewriting techniques to transform potentially infringing segments. By leveraging efficient data sketches (Bloom filters), our approach enables scalable copyright screening even for large-scale real-world corpora. When quotes beyond a length threshold cannot be removed, the system can abstain from responding, offering certified risk reduction. Experimental results show that BloomScrub reduces infringement risk, preserves utility, and accommodates different levels of enforcement stringency with adaptive abstention. Our results suggest that lightweight, inference-time methods can be surprisingly effective for copyright prevention.

Paper Structure

This paper contains 45 sections, 2 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: BloomScrub works by interleaving two key steps: (1) using a Bloom filter to extract high-risk quotes from model responses, and (2) apply guided rewriting to "scrub" these quotes from the text. This iterative process ensures removal of high-risk quotes while preserving utility.
  • Figure 2: BloomScrub drastically outperforms other methods on long quote reduction.
  • Figure 3: Inference-time adaptability of BloomScrub to different risk threshold $\tau$. As the risk threshold decreases, the mean number of rewrite iterations increases, and BloomScrub continues to reduce max character LCS and percentage of examples with quotes longer than 100 characters.
  • Figure 4: Distribution of number of rewrites under different risk threshold $\tau$. Given a smaller (thus more stringent) $\tau$, the distribution of rewrite shifts to the right.
  • Figure 5: Percentage of long quotes ($\geq$50 characters) that contain a long named entity ($\geq$30 characters). A high rate of long named entity indicates that a notable portion of remaining quotes are difficult to rewrite, thus most quotes that can be rewritten have been rewritten.
  • ...and 1 more figures