Table of Contents
Fetching ...

Watermarking Autoregressive Image Generation

Nikola Jovanović, Ismail Labiad, Tomáš Souček, Martin Vechev, Pierre Fernandez

TL;DR

This work addresses the challenge of watermarking autoregressive image generation at the token level, where reverse-cycle-consistency (RCC) can erode embedded signals during re-tokenization. It adapts generation-time LLM watermarks (KGW) to image tokens and introduces a lightweight RCC finetuning procedure for the detokenizer and an encoder replica, plus a post-hoc watermark synchronization layer to counter geometric transformations. The approach yields robust, provable-watermark detection (via $p$-values) without harming generation quality, and extends to interleaved modalities and a preliminary audio case study. Overall, the method advances provenance tracing for multimodal autoregressive generation, offering stronger robustness than post-hoc baselines and enabling joint detection across modalities with scalable computation.

Abstract

Watermarking the outputs of generative models has emerged as a promising approach for tracking their provenance. Despite significant interest in autoregressive image generation models and their potential for misuse, no prior work has attempted to watermark their outputs at the token level. In this work, we present the first such approach by adapting language model watermarking techniques to this setting. We identify a key challenge: the lack of reverse cycle-consistency (RCC), wherein re-tokenizing generated image tokens significantly alters the token sequence, effectively erasing the watermark. To address this and to make our method robust to common image transformations, neural compression, and removal attacks, we introduce (i) a custom tokenizer-detokenizer finetuning procedure that improves RCC, and (ii) a complementary watermark synchronization layer. As our experiments demonstrate, our approach enables reliable and robust watermark detection with theoretically grounded p-values. Code and models are available at https://github.com/facebookresearch/wmar.

Watermarking Autoregressive Image Generation

TL;DR

This work addresses the challenge of watermarking autoregressive image generation at the token level, where reverse-cycle-consistency (RCC) can erode embedded signals during re-tokenization. It adapts generation-time LLM watermarks (KGW) to image tokens and introduces a lightweight RCC finetuning procedure for the detokenizer and an encoder replica, plus a post-hoc watermark synchronization layer to counter geometric transformations. The approach yields robust, provable-watermark detection (via -values) without harming generation quality, and extends to interleaved modalities and a preliminary audio case study. Overall, the method advances provenance tracing for multimodal autoregressive generation, offering stronger robustness than post-hoc baselines and enabling joint detection across modalities with scalable computation.

Abstract

Watermarking the outputs of generative models has emerged as a promising approach for tracking their provenance. Despite significant interest in autoregressive image generation models and their potential for misuse, no prior work has attempted to watermark their outputs at the token level. In this work, we present the first such approach by adapting language model watermarking techniques to this setting. We identify a key challenge: the lack of reverse cycle-consistency (RCC), wherein re-tokenizing generated image tokens significantly alters the token sequence, effectively erasing the watermark. To address this and to make our method robust to common image transformations, neural compression, and removal attacks, we introduce (i) a custom tokenizer-detokenizer finetuning procedure that improves RCC, and (ii) a complementary watermark synchronization layer. As our experiments demonstrate, our approach enables reliable and robust watermark detection with theoretically grounded p-values. Code and models are available at https://github.com/facebookresearch/wmar.

Paper Structure

This paper contains 103 sections, 12 equations, 25 figures, 8 tables.

Figures (25)

  • Figure 1: We watermark autoregressively generated images together with text in a theoretically principled way by adapting LLM watermarking. We identify and address the novel challenges present in this setting (\ref{['sec:method']}) via a custom (de)tokenizer finetuning procedure (\ref{['ssec:method:ft']}) and a watermark synchronization layer (\ref{['ssec:method:sync']}).
  • Figure 2: Example of our watermark on an autoregressively generated image. We generate the upper half of the image without the watermark. We then complete the bottom half in the same way (left) or with the watermark (right). The overlay indicates generated image tokens detected as green ( ), red ( ), or ignored as a duplicate ( ). The watermark only alters semantics and could be detected even when applied only partially as in this case.
  • Figure 3: A replica $E'$ of the encoder and the decoder $D$ are jointly trained to improve reverse-cycle consistency, i.e., make $E'(D(\hat{z}))$ close to $\hat{z}$ for most generations of the autoregressive model $\mathcal{M}$, even under transformations.
  • Figure 4: Watermark synchronization. Localized messages are embedded into a generated watermarked image and later used to discover the unknown transformation and revert it, which recovers the original watermark.
  • Figure 5: Left: Finetuning improves token match (\ref{['eq:tm']}) between original and re-tokenized image tokens. Right: All variants achieve TPR ${\approx}{1}$ at FPR of $1\%$. Finetuning further boosts detection in low-FPR settings.
  • ...and 20 more figures