Strong Copyright Protection for Language Models via Adaptive Model Fusion
Javier Abad, Konstantin Donhauser, Francesco Pinto, Fanny Yang
TL;DR
This work introduces CP-Fuse, an adaptive model-fusion algorithm that combines logits from multiple base language models to minimize memorization of copyrighted content while preserving generation quality. Grounded in the Near-Access Freeness framework and a separability assumption, CP-Fuse derives a logit-update rule where the next-token probability is a balanced combination of base-model forecasts, enforced by a grid-searched weighting scheme. The authors prove a balancing property and demonstrate, across code and text tasks with heavily overfitted baselines, that CP-Fuse reduces exact and approximate memorization by more than 25x relative to infringement-prone models, while maintaining competitive perplexity and output quality. Moreover, CP-Fuse can be applied on top of any model and integrated with other copyright-protection techniques, offering a practical, scalable wrapper for enhancing model safeguards in real-world deployments. Future directions include relaxing full separability assumptions and exploring combination with additional mitigation methods.
Abstract
The risk of language models unintentionally reproducing copyrighted material from their training data has led to the development of various protective measures. In this paper, we propose model fusion as an effective solution to safeguard against copyright infringement. In particular, we introduce Copyright-Protecting Fusion (CP-Fuse), an algorithm that adaptively combines language models to minimize the reproduction of protected materials. CP-Fuse is inspired by the recently proposed Near-Access Free (NAF) framework and additionally incorporates a desirable balancing property that we demonstrate prevents the reproduction of memorized training data. Our results show that CP-Fuse significantly reduces the memorization of copyrighted content while maintaining high-quality text and code generation. Furthermore, we demonstrate how CP-Fuse can be integrated with other techniques for enhanced protection.
