Table of Contents
Fetching ...

Strong Copyright Protection for Language Models via Adaptive Model Fusion

Javier Abad, Konstantin Donhauser, Francesco Pinto, Fanny Yang

TL;DR

This work introduces CP-Fuse, an adaptive model-fusion algorithm that combines logits from multiple base language models to minimize memorization of copyrighted content while preserving generation quality. Grounded in the Near-Access Freeness framework and a separability assumption, CP-Fuse derives a logit-update rule where the next-token probability is a balanced combination of base-model forecasts, enforced by a grid-searched weighting scheme. The authors prove a balancing property and demonstrate, across code and text tasks with heavily overfitted baselines, that CP-Fuse reduces exact and approximate memorization by more than 25x relative to infringement-prone models, while maintaining competitive perplexity and output quality. Moreover, CP-Fuse can be applied on top of any model and integrated with other copyright-protection techniques, offering a practical, scalable wrapper for enhancing model safeguards in real-world deployments. Future directions include relaxing full separability assumptions and exploring combination with additional mitigation methods.

Abstract

The risk of language models unintentionally reproducing copyrighted material from their training data has led to the development of various protective measures. In this paper, we propose model fusion as an effective solution to safeguard against copyright infringement. In particular, we introduce Copyright-Protecting Fusion (CP-Fuse), an algorithm that adaptively combines language models to minimize the reproduction of protected materials. CP-Fuse is inspired by the recently proposed Near-Access Free (NAF) framework and additionally incorporates a desirable balancing property that we demonstrate prevents the reproduction of memorized training data. Our results show that CP-Fuse significantly reduces the memorization of copyrighted content while maintaining high-quality text and code generation. Furthermore, we demonstrate how CP-Fuse can be integrated with other techniques for enhanced protection.

Strong Copyright Protection for Language Models via Adaptive Model Fusion

TL;DR

This work introduces CP-Fuse, an adaptive model-fusion algorithm that combines logits from multiple base language models to minimize memorization of copyrighted content while preserving generation quality. Grounded in the Near-Access Freeness framework and a separability assumption, CP-Fuse derives a logit-update rule where the next-token probability is a balanced combination of base-model forecasts, enforced by a grid-searched weighting scheme. The authors prove a balancing property and demonstrate, across code and text tasks with heavily overfitted baselines, that CP-Fuse reduces exact and approximate memorization by more than 25x relative to infringement-prone models, while maintaining competitive perplexity and output quality. Moreover, CP-Fuse can be applied on top of any model and integrated with other copyright-protection techniques, offering a practical, scalable wrapper for enhancing model safeguards in real-world deployments. Future directions include relaxing full separability assumptions and exploring combination with additional mitigation methods.

Abstract

The risk of language models unintentionally reproducing copyrighted material from their training data has led to the development of various protective measures. In this paper, we propose model fusion as an effective solution to safeguard against copyright infringement. In particular, we introduce Copyright-Protecting Fusion (CP-Fuse), an algorithm that adaptively combines language models to minimize the reproduction of protected materials. CP-Fuse is inspired by the recently proposed Near-Access Free (NAF) framework and additionally incorporates a desirable balancing property that we demonstrate prevents the reproduction of memorized training data. Our results show that CP-Fuse significantly reduces the memorization of copyrighted content while maintaining high-quality text and code generation. Furthermore, we demonstrate how CP-Fuse can be integrated with other techniques for enhanced protection.
Paper Structure (49 sections, 2 theorems, 12 equations, 10 figures, 10 tables)

This paper contains 49 sections, 2 theorems, 12 equations, 10 figures, 10 tables.

Key Result

Lemma 4.1

The optimal solution $p(y_t \, | \, y_{<t}, x)$ of the optimization problem in Equation eq:pt satisfiesWe set $\log(0) = - \infty$ for some $\alpha_t, \beta_t \geq 0, \gamma_t \in \mathbb{R}$.

Figures (10)

  • Figure 1: Log-likelihood of the sequences produced by CP-Fuse and $\text{CP-}\Delta$, and their base models $p^{(1)}$ and $p^{(2)}$, at each generated token. We show a random generation from StarCoder models fine-tuned on the Python instructional dataset, see \ref{['sec:experiments']} for details.
  • Figure 2: Histogram of exactly matched substring lengths (above 40 characters) generated by CP-$\Delta$ and CP-Fuse for (a) the Python instructions and (b) the math abstracts datasets. We show the longest substring and one randomly sampled match above 40 characters.
  • Figure 3: (Same as \ref{['fig:cumulative_main']}) Log-likelihood for the sequence produced by CP-Fuse and $\text{CP-}\Delta$, and the corresponding base models $p^{(1)}$ and $p^{(2)}$ at each token in greedy decoding. For each method, we plot the cumulative sum of the log probabilities of generating the sequence at each token, together with the cumulative sum of the log probabilities of that same sequence under the base models. Due to the balancing property, CP-Fuse achieves $\log p^{(1)}(y_{\leq t} \vert x) \approx \log p^{(2)}(y_{\leq t} \vert x)$ at all steps of the generation, indicating that the tokens produced by CP-Fuse are roughly equally likely under both base models, hence preventing the reproduction of memorized samples. In contrast, $\text{CP-}\Delta$ places significantly more weight on the second model $p^{(2)}$, as evidenced by the much higher log-likelihood of the generated tokens under $p^{(2)}$ compared to $p^{(1)}$. This increases the likelihood of reproducing memorized samples from $p^{(2)}$.
  • Figure 4: Evolution of the parameters $\alpha_t$ and $\beta_t$ during greedy decoding. We randomly sampled six examples of text generated by our method CP-Fuse, combining overfitted Phi-2 models on the math abstract dataset. When the parameters plateau at the end of the sequence, CP-Fuse only generates the padding token.
  • Figure 5: Example of text generated by the overfitted, copyright-infringing model, CP-Fuse, $\text{CP-}\Delta$, the early-stopped model, and the base model for the Python instructional dataset using StarCode models.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Lemma 4.1
  • Lemma 4.2