Table of Contents
Fetching ...

How Good is Post-Hoc Watermarking With Language Model Rephrasing?

Pierre Fernandez, Tom Sander, Hady Elsahar, Hongyan Chang, Tomáš Souček, Valeriu Lacatusu, Tuan Tran, Sylvestre-Alvise Rebuffi, Alexandre Mourachko

TL;DR

<3-5 sentence high-level summary>Post-hoc watermarking via LLM paraphrasing enables embedding detectable signals into existing text, enabling copyright protection and data-protection use cases without modifying generation-time deployment. The authors systematically evaluate multiple watermarking schemes (e.g., Green-red, Gumbel-max, SynthID, DiPMark, MorphMark), various decoding strategies, and compute budgets across prose, encyclopedic, and code domains. They find Gumbel-max often provides the best quality-detectability frontier under random sampling, beam search substantially improves performance for most schemes, and smaller models with higher entropy can outperform larger models when strong watermarks are required, though code requires strict correctness constraints that limit detectability. The work contributes a large-scale empirical framework, key-sensitivity analyses, and open-source tooling to guide practical deployment and future research.

Abstract

Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM rewrites existing text while applying generation-time watermarking, to protect copyrighted documents, or detect their use in training or RAG via watermark radioactivity. Unlike generation-time approaches, which is constrained by how LLMs are served, this setting offers additional degrees of freedom for both generation and detection. We investigate how allocating compute (through larger rephrasing models, beam search, multi-candidate generation, or entropy filtering at detection) affects the quality-detectability trade-off. Our strategies achieve strong detectability and semantic fidelity on open-ended text such as books. Among our findings, the simple Gumbel-max scheme surprisingly outperforms more recent alternatives under nucleus sampling, and most methods benefit significantly from beam search. However, most approaches struggle when watermarking verifiable text such as code, where we counterintuitively find that smaller models outperform larger ones. This study reveals both the potential and limitations of post-hoc watermarking, laying groundwork for practical applications and future research.

How Good is Post-Hoc Watermarking With Language Model Rephrasing?

TL;DR

<3-5 sentence high-level summary>Post-hoc watermarking via LLM paraphrasing enables embedding detectable signals into existing text, enabling copyright protection and data-protection use cases without modifying generation-time deployment. The authors systematically evaluate multiple watermarking schemes (e.g., Green-red, Gumbel-max, SynthID, DiPMark, MorphMark), various decoding strategies, and compute budgets across prose, encyclopedic, and code domains. They find Gumbel-max often provides the best quality-detectability frontier under random sampling, beam search substantially improves performance for most schemes, and smaller models with higher entropy can outperform larger models when strong watermarks are required, though code requires strict correctness constraints that limit detectability. The work contributes a large-scale empirical framework, key-sensitivity analyses, and open-source tooling to guide practical deployment and future research.

Abstract

Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM rewrites existing text while applying generation-time watermarking, to protect copyrighted documents, or detect their use in training or RAG via watermark radioactivity. Unlike generation-time approaches, which is constrained by how LLMs are served, this setting offers additional degrees of freedom for both generation and detection. We investigate how allocating compute (through larger rephrasing models, beam search, multi-candidate generation, or entropy filtering at detection) affects the quality-detectability trade-off. Our strategies achieve strong detectability and semantic fidelity on open-ended text such as books. Among our findings, the simple Gumbel-max scheme surprisingly outperforms more recent alternatives under nucleus sampling, and most methods benefit significantly from beam search. However, most approaches struggle when watermarking verifiable text such as code, where we counterintuitively find that smaller models outperform larger ones. This study reveals both the potential and limitations of post-hoc watermarking, laying groundwork for practical applications and future research.

Paper Structure

This paper contains 58 sections, 8 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Post-hoc text watermarking through watermarked LLM rephrasing. We do empirical evaluations and analyze detection power, semantic fidelity, and correctness according to different design choices such as watermark scheme and available compute (through the paraphrasing model and the decoding strategy).
  • Figure 2: Pareto fronts of watermarking methods showing the trade-off between quality and watermark strength using Llama-3.2-3B-Instruct. Each point corresponds to a different parameter configuration, with values representing medians across 100 rephrased passages. Experimental details are given in Sections \ref{['subsec:exp_set_up']} and \ref{['subsec:exp_pareto']}.
  • Figure 3: Impact of model family and size. Cross Entropy vs. watermark strength. Larger models improve quality for a given watermark strength, but small models are necessary to reach high strengths. All families are comparable, except Gemma-3 that is not suitable. Experimental details are given in Sections \ref{['subsec:exp_set_up']} and \ref{['subsec:model_scale']}.
  • Figure 4: Beam search improves the Pareto frontier. Cross entropy vs. watermark strength for suitable methods. Beam search, especially with biased scoring (see \ref{['sec:method']}), shifts the frontier upward, substantially improving rephrasing quality at a fixed watermark strength. Experimental details are given in Sections \ref{['subsec:exp_set_up']} and \ref{['subsec:beam_search']}.
  • Figure 5: Effect of entropy-aware detection. Left: Share of configurations for which some threshold improves detection by more than $5\%$ on at least half of the texts. Middle: Median relative improvement for those configurations. Right: For every configuration, fraction of texts helped at optimal threshold vs. median improvement; dashed lines mark the $50\%$ and $+5\%$ criteria. Experimental details are given in Sections \ref{['subsec:exp_set_up']} and \ref{['subsec:entropy_ablation']}.
  • ...and 4 more figures