Table of Contents
Fetching ...

Copyright-Protected Language Generation via Adaptive Model Fusion

Javier Abad, Konstantin Donhauser, Francesco Pinto, Fanny Yang

TL;DR

CP-Fuse offers a post-hoc, inference-time protection against copyright infringement by adaptively fusing logits from two independently trained models on disjoint copyrighted data. Grounded in a separability assumption and the $k$-NAF framework, the method derives a sequence-history–dependent fusion that balances contributions from both base models, ensuring reduced regurgitation while preserving text and code utility. Empirical results across abstract/story text and Python/code datasets show >=25x reductions in exact memorization with competitive or superior utility compared to inference-time baselines, and the approach complements training-time defenses and remains robust to prefix prompting extractions. The work provides a practical, modular safeguard with potential for scaling to larger models and partial separability scenarios, offering a concrete path toward safer deployment of copyright-sensitive LLMs.

Abstract

The risk of language models reproducing copyrighted material from their training data has led to the development of various protective measures. Among these, inference-time strategies that impose constraints via post-processing have shown promise in addressing the complexities of copyright regulation. However, they often incur prohibitive computational costs or suffer from performance trade-offs. To overcome these limitations, we introduce Copyright-Protecting Model Fusion (CP-Fuse), a novel approach that combines models trained on disjoint sets of copyrighted material during inference. In particular, CP-Fuse adaptively aggregates the model outputs to minimize the reproduction of copyrighted content, adhering to a crucial balancing property that prevents the regurgitation of memorized data. Through extensive experiments, we show that CP-Fuse significantly reduces the reproduction of protected material without compromising the quality of text and code generation. Moreover, its post-hoc nature allows seamless integration with other protective measures, further enhancing copyright safeguards. Lastly, we show that CP-Fuse is robust against common techniques for extracting training data.

Copyright-Protected Language Generation via Adaptive Model Fusion

TL;DR

CP-Fuse offers a post-hoc, inference-time protection against copyright infringement by adaptively fusing logits from two independently trained models on disjoint copyrighted data. Grounded in a separability assumption and the -NAF framework, the method derives a sequence-history–dependent fusion that balances contributions from both base models, ensuring reduced regurgitation while preserving text and code utility. Empirical results across abstract/story text and Python/code datasets show >=25x reductions in exact memorization with competitive or superior utility compared to inference-time baselines, and the approach complements training-time defenses and remains robust to prefix prompting extractions. The work provides a practical, modular safeguard with potential for scaling to larger models and partial separability scenarios, offering a concrete path toward safer deployment of copyright-sensitive LLMs.

Abstract

The risk of language models reproducing copyrighted material from their training data has led to the development of various protective measures. Among these, inference-time strategies that impose constraints via post-processing have shown promise in addressing the complexities of copyright regulation. However, they often incur prohibitive computational costs or suffer from performance trade-offs. To overcome these limitations, we introduce Copyright-Protecting Model Fusion (CP-Fuse), a novel approach that combines models trained on disjoint sets of copyrighted material during inference. In particular, CP-Fuse adaptively aggregates the model outputs to minimize the reproduction of copyrighted content, adhering to a crucial balancing property that prevents the regurgitation of memorized data. Through extensive experiments, we show that CP-Fuse significantly reduces the reproduction of protected material without compromising the quality of text and code generation. Moreover, its post-hoc nature allows seamless integration with other protective measures, further enhancing copyright safeguards. Lastly, we show that CP-Fuse is robust against common techniques for extracting training data.

Paper Structure

This paper contains 75 sections, 2 theorems, 15 equations, 24 figures, 13 tables.

Key Result

Lemma 3.1

The optimal solution $p(y_t \, | \, y_{<t}, x)$ of the optimization problem in Equation eq:pt satisfiesWe set $\log(0) = - \infty$ for some $\alpha_t, \beta_t \geq 0, \gamma_t \in \mathbb{R}$.

Figures (24)

  • Figure 1: (Top) Illustration of the copyright-protecting fusion strategy. The left panel shows the training datasets $\mathcal{X}_1$ (in red) and $\mathcal{X}_2$ (in blue), each containing disjoint copyright sets $\mathcal{C}_1 \subset \mathcal{X}_1$ (in light red) and $\mathcal{C}_2 \subset \mathcal{X}_2$ (in light blue). The middle panel depicts our copyright-protecting fusion algorithm. The right panel displays the learned distributions of the potentially infringing models $\mathbf{p}^{(1)}$ (in red) and $\mathbf{p}^{(2)}$ (in blue), along with the resulting safe model $\mathbf{p}$ (in green). Lighter regions indicate areas of lower probability; although the safe model still retains “access” to the copyrighted content, the probability of regurgitating it is very low. (Bottom) Generations from $\mathbf{p}^{(1)}$, $\mathbf{p}^{(2)}$, and $\mathbf{p}$ given the same prompt; the first two generations reproduce copyrighted material, $\mathbf{p}$ generates an original story.
  • Figure 2: Log-likelihood of sequences produced by CP-Fuse and $\text{CP-}\Delta$, and their base models $p^{(1)}$ and $p^{(2)}$, at each generated token. We show a random generation from StarCoder fine-tuned on the Python instructions; details in \ref{['sec:experiments']}.
  • Figure 3: Histogram of exactly matched substring lengths (above 40 characters) generated by CP-$\Delta$ and CP-Fuse for (a) the Python instructions and (b) the MathAbstracts datasets. We show the longest substring and one randomly sampled match above 40 characters.
  • Figure 4: Examples of typical errors in code generated by MemFree for the APPS dataset. The highlighted characters indicate the code affected by filtering, leading to errors. On the left, MemFree changes a variable name, resulting in a syntax error. On the right, MemFree alters the logic of the code, producing incorrect output, whereas CP-Fuse successfully solves the problem.
  • Figure 5: Metrics for copyright infringement, exact matching (EM), and BLEU score (BLE) in the WritingPrompts dataset for (a) models trained with goldfish loss (GL) and CP-Fuse as a wrapper, and (b) the effect of the prefix length on CP-Fuse applied to overfitted models in the split 1.
  • ...and 19 more figures

Theorems & Definitions (3)

  • Definition 3.1
  • Lemma 3.1
  • Lemma 3.2