How Good is Post-Hoc Watermarking With Language Model Rephrasing?
Pierre Fernandez, Tom Sander, Hady Elsahar, Hongyan Chang, Tomáš Souček, Valeriu Lacatusu, Tuan Tran, Sylvestre-Alvise Rebuffi, Alexandre Mourachko
TL;DR
<3-5 sentence high-level summary>Post-hoc watermarking via LLM paraphrasing enables embedding detectable signals into existing text, enabling copyright protection and data-protection use cases without modifying generation-time deployment. The authors systematically evaluate multiple watermarking schemes (e.g., Green-red, Gumbel-max, SynthID, DiPMark, MorphMark), various decoding strategies, and compute budgets across prose, encyclopedic, and code domains. They find Gumbel-max often provides the best quality-detectability frontier under random sampling, beam search substantially improves performance for most schemes, and smaller models with higher entropy can outperform larger models when strong watermarks are required, though code requires strict correctness constraints that limit detectability. The work contributes a large-scale empirical framework, key-sensitivity analyses, and open-source tooling to guide practical deployment and future research.
Abstract
Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM rewrites existing text while applying generation-time watermarking, to protect copyrighted documents, or detect their use in training or RAG via watermark radioactivity. Unlike generation-time approaches, which is constrained by how LLMs are served, this setting offers additional degrees of freedom for both generation and detection. We investigate how allocating compute (through larger rephrasing models, beam search, multi-candidate generation, or entropy filtering at detection) affects the quality-detectability trade-off. Our strategies achieve strong detectability and semantic fidelity on open-ended text such as books. Among our findings, the simple Gumbel-max scheme surprisingly outperforms more recent alternatives under nucleus sampling, and most methods benefit significantly from beam search. However, most approaches struggle when watermarking verifiable text such as code, where we counterintuitively find that smaller models outperform larger ones. This study reveals both the potential and limitations of post-hoc watermarking, laying groundwork for practical applications and future research.
