Investigating the Feasibility of Mitigating Potential Copyright Infringement via Large Language Model Unlearning
Guangyao Dou
TL;DR
This work tackles the risk of copyright infringement in pre-trained LLMs by studying sequential unlearning, where copyrighted content is removed over time. It introduces Stable Sequential Unlearning (SSU), which combines learning stable task vectors with a random labeling loss and a gradient-based weight saliency mechanism to localize updates and minimize collateral damage to non-targeted knowledge and general-language abilities. Through experiments on Llama-3.1-8B-Instruct and Mistral-7B-Instruct-v0.3 using Gutenberg books, SSU generally achieves a favorable trade-off between reducing copyright risk (lower Rouge-1 and Rouge-L on forgotten and previously forgotten data) and preserving general-purpose capabilities (MMLU, MT-Bench), outperforming several baselines but not eliminating all risks. The results underscore both the potential of principled unlearning for copyright takedowns and the need for further work, including robust evaluation, certified guarantees, and complementary measures beyond unlearning to address copyright concerns in generative AI systems.
Abstract
Pre-trained Large Language Models (LLMs) have demonstrated remarkable capabilities but also pose risks by learning and generating copyrighted material, leading to significant legal and ethical concerns. In a potential real-world scenario, model owners may need to continuously address copyright infringement in order to address requests for content removal that emerge at different time points. One potential way of addressing this is via sequential unlearning, where copyrighted content is removed sequentially as new requests arise. Despite its practical relevance, sequential unlearning in the context of copyright infringement has not been rigorously explored in existing literature. To address this gap, we propose Stable Sequential Unlearning (SSU), a novel framework designed to unlearn copyrighted content from LLMs over multiple time steps. Our approach works by identifying and removing specific weight updates in the model's parameters that correspond to copyrighted content using task vectors. We improve unlearning efficacy by introducing random labeling loss and ensuring the model retains its general-purpose knowledge by adjusting targeted parameters with gradient-based weight saliency. Extensive experimental results show that SSU sometimes achieves an effective trade-off between unlearning efficacy and general-purpose language abilities, outperforming existing baselines, but it's not a cure-all for unlearning copyrighted material.
