SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs
Zhenliang Zhang, Xinyu Hu, Xiaojun Wan
TL;DR
This work reframes copyright infringement mitigation in LLMs as intrinsic semantic-space control and introduces SCoPe, a two-stage, inference-time method that identifies a copyright-sensitive subspace in a sparse SAE latent space and clamps its activations during decoding. By moving away from surface-level filters, SCoPe achieves substantial reductions in copyrighted content regurgitation while preserving general utility, validated on NewsQA, BookSum, and MMLU benchmarks. The core contributions include formulating a subspace hypothesis, defining the Copyright Alignment Score to empirically identify a compact subspace, and demonstrating both causal control (via reverse intervention) and interpretability of the copyright-related semantics. This approach offers a lightweight, filter-free mechanism for copyright protection with practical implications for deployment, though it relies on open models with accessible intermediate representations and assumes a linear subspace. Overall, SCoPe shows that targeted, semantic-level interventions at decoding can reconcile copyright risk with task performance in LLMs.
Abstract
Large language models sometimes inadvertently reproduce passages that are copyrighted, exposing downstream applications to legal risk. Most existing studies for inference-time defences focus on surface-level token matching and rely on external blocklists or filters, which add deployment complexity and may overlook semantically paraphrased leakage. In this work, we reframe copyright infringement mitigation as intrinsic semantic-space control and introduce SCOPE, an inference-time method that requires no parameter updates or auxiliary filters. Specifically, the sparse autoencoder (SAE) projects hidden states into a high-dimensional, near-monosemantic space; benefiting from this representation, we identify a copyright-sensitive subspace and clamp its activations during decoding. Experiments on widely recognized benchmarks show that SCOPE mitigates copyright infringement without degrading general utility. Further interpretability analyses confirm that the isolated subspace captures high-level semantics.
