Avoiding Copyright Infringement via Large Language Model Unlearning
Guangyao Dou, Zheyuan Liu, Qing Lyu, Kaize Ding, Eric Wong
TL;DR
The paper tackles the challenge of copyright infringement in large language models by proposing Stable Sequential Unlearning (SSU), a method to forget copyrighted content across multiple time steps without retraining from scratch. SSU uses stable task vectors, random labeling loss, and a gradient-based weight saliency map to limit updates to the most relevant parameters, enabling effective unlearning while preserving general knowledge and language abilities. Through experiments on Llama-3.1-8B-Instruct and Mistral-7B-Instruct, SSU outperforms baselines (including NPO and Gradient Difference) in the trade-off between reducing copyright leakage ( Rouge-based metrics) and maintaining MMLU/MT-Bench performance, though some unintended knowledge loss and re-emergence remain challenges. The work highlights the practical viability of sequential copyright takedown in production LLMs and discusses robustness, limitations, and avenues for future improvement, such as combining unlearning with generation-time safeguards and data-tracing tools.
Abstract
Pre-trained Large Language Models (LLMs) have demonstrated remarkable capabilities but also pose risks by learning and generating copyrighted material, leading to significant legal and ethical concerns. In real-world scenarios, model owners need to continuously address copyright infringement as new requests for content removal emerge at different time points. This leads to the need for sequential unlearning, where copyrighted content is removed sequentially as new requests arise. Despite its practical relevance, sequential unlearning in the context of copyright infringement has not been rigorously explored in existing literature. To address this gap, we propose Stable Sequential Unlearning (SSU), a novel framework designed to unlearn copyrighted content from LLMs over multiple time steps. Our approach works by identifying and removing specific weight updates in the model's parameters that correspond to copyrighted content. We improve unlearning efficacy by introducing random labeling loss and ensuring the model retains its general-purpose knowledge by adjusting targeted parameters. Experimental results show that SSU achieves an effective trade-off between unlearning efficacy and general-purpose language abilities, outperforming existing baselines.
