Table of Contents
Fetching ...

Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model

Jacqueline He, Jonathan Hayase, Wen-tau Yih, Sewoong Oh, Luke Zettlemoyer, Pang Wei Koh

TL;DR

Anchored Decoding addresses the risk of verbatim copyright reproduction in language models by introducing a plug-in inference-time fusion between a risky model and a safe anchor under a global $K$-NAF budget. The method computes a per-step fused distribution via a closed-form geometric mean between $p_r$ and $p_s$, with a prefix debt and an adaptive budgeting strategy to enforce the sequence-level bound while preserving utility. A byte-level variant, Anchored$_{\mathrm{Byte}}$ Decoding, extends the approach to mismatched tokenizers using ByteSampler, maintaining $K$-NAF guarantees. Across six model-pair experiments, Anchored Decoding achieves a Pareto frontier with substantial reductions in copying (up to 75% of the gap) while keeping fluency and factuality near baseline levels, at modest computational overhead. The work provides a practical, theoretically grounded framework for constraining high-capability generators to a trusted reference distribution, with broad applicability beyond copyright mitigation.

Abstract

Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction raises issues of consent and compensation for creators and compliance risks for developers. We propose Anchored Decoding, a plug-and-play inference-time method for suppressing verbatim copying: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe LM. Anchored Decoding adaptively allocates a user-chosen information budget over the generation trajectory and enforces per-step constraints that yield a sequence-level guarantee, enabling a tunable risk-utility trade-off. To make Anchored Decoding practically useful, we introduce a new permissively trained safe model (TinyComma 1.8B), as well as Anchored$_{\mathrm{Byte}}$ Decoding, a byte-level variant of our method that enables cross-vocabulary fusion via the ByteSampler framework (Hayase et al., 2025). We evaluate our methods across six model pairs on long-form evaluations of copyright risk and utility. Anchored and Anchored$_{\mathrm{Byte}}$ Decoding define a new Pareto frontier, preserving near-original fluency and factuality while eliminating up to 75% of the measurable copying gap (averaged over six copying metrics) between the risky baseline and a safe reference, at a modest inference overhead.

Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model

TL;DR

Anchored Decoding addresses the risk of verbatim copyright reproduction in language models by introducing a plug-in inference-time fusion between a risky model and a safe anchor under a global -NAF budget. The method computes a per-step fused distribution via a closed-form geometric mean between and , with a prefix debt and an adaptive budgeting strategy to enforce the sequence-level bound while preserving utility. A byte-level variant, Anchored Decoding, extends the approach to mismatched tokenizers using ByteSampler, maintaining -NAF guarantees. Across six model-pair experiments, Anchored Decoding achieves a Pareto frontier with substantial reductions in copying (up to 75% of the gap) while keeping fluency and factuality near baseline levels, at modest computational overhead. The work provides a practical, theoretically grounded framework for constraining high-capability generators to a trusted reference distribution, with broad applicability beyond copyright mitigation.

Abstract

Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction raises issues of consent and compensation for creators and compliance risks for developers. We propose Anchored Decoding, a plug-and-play inference-time method for suppressing verbatim copying: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe LM. Anchored Decoding adaptively allocates a user-chosen information budget over the generation trajectory and enforces per-step constraints that yield a sequence-level guarantee, enabling a tunable risk-utility trade-off. To make Anchored Decoding practically useful, we introduce a new permissively trained safe model (TinyComma 1.8B), as well as Anchored Decoding, a byte-level variant of our method that enables cross-vocabulary fusion via the ByteSampler framework (Hayase et al., 2025). We evaluate our methods across six model pairs on long-form evaluations of copyright risk and utility. Anchored and Anchored Decoding define a new Pareto frontier, preserving near-original fluency and factuality while eliminating up to 75% of the measurable copying gap (averaged over six copying metrics) between the risky baseline and a safe reference, at a modest inference overhead.
Paper Structure (93 sections, 11 theorems, 34 equations, 12 figures, 21 tables, 2 algorithms)

This paper contains 93 sections, 11 theorems, 34 equations, 12 figures, 21 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $p^*$ be a sequence-level distribution defined autoregressively by $p^*(y_{<T}|x) = \prod_{t=0}^{T-1} p_t^*(y_t \vert y_{<t}, x)$. If, for all decoding steps $t<T_{\max}$, the conditional distribution $p_t^*$ solves eq:token_level_opt with a per-step budget $k_t$ such that $\sum_{t=0}^{T_{\max}-

Figures (12)

  • Figure 1: (a). Given the opening line of J.R.R. Tolkien's The Fellowship of the Ring (1954), the risky LM outputs its verbatim continuation, while the safe LM produces a less fluent, repetitive alternative. Anchored Decoding generates in bounded proximity to the safe LM within a budget $K$, while leveraging utility from the risky LM, and produces a plausible, non-infringing continuation. (b). With the safe-risky LM pair {TinyComma 1.8B, Llama 3.1 70B}, Anchored Decoding (in purple) achieves the best risk-utility tradeoff.
  • Figure 2: Anchored$_{\mathrm{Byte}}$Decoding (in purple) achieves the best risk-utility tradeoff at the byte level across five model pairs. We report the average of three seeds; error bars show standard deviation. The shaded threshold denotes the high-protection operating point, where the Normalized Copyright Reduction (NCR)$\geq 75\%$. NCR and fluency are evaluated on Books, and factuality on Bios.
  • Figure 3: Risk-utility tradeoffs for Anchored Decoding ablations. We ablate three axes: (i) optimization objective, (ii) prefix debt, and (iii) budgeting strategy. For brevity, our methods are labeled as Anc. Dec.
  • Figure 4: Top: Per-step $\mathrm{KL}(p_r\|p_s)$ histogram when sampling from $p_r$, conditioned on prefixes different domains. The Copyright domain is more right-shifted than the Creative and Factual domains. Bottom: Unconditional CCDF of per-step $\mathrm{KL}(p_r|p_s)$, shown for $x \ge q_{90}$. $q_{90}$ is computed from per-step KL values pooled across domains (shared cutoff per panel). The Copyright domain has a heavier extreme tail than others.
  • Figure 5: High-copying regions are front-loaded under both byte-level and token-level decoding. We plot histograms (bin width of 5) of the start position of copying metrics (LCS and ACS) on Copyright generations. Copying tends to cluster at early positions.
  • ...and 7 more figures

Theorems & Definitions (20)

  • Definition 2.0
  • Theorem 3.1: Safety of local approximation
  • Corollary 3.2: Constant per-step cap
  • Proposition 3.2: Solving for $p_t^*$
  • Proposition 3.2: Global safety of adaptive banking
  • Theorem 2.1: Safety of local approximation
  • proof
  • Lemma 2.1: Interior optimality on the common support
  • proof
  • Proposition 2.1: Solving for $p_t^*$
  • ...and 10 more