Table of Contents
Fetching ...

Training-Free Watermarking for Autoregressive Image Generation

Yu Tong, Zihao Pan, Shuai Yang, Kaiyang Zhou

TL;DR

IndexMark introduces a training-free watermarking approach for autoregressive image generation by exploiting codebook redundancy through red–green index pairs and a match-then-replace embedding strategy. Watermark presence is verified via the green-index rate, enhanced by an Index Encoder and a cropping-robust verification protocol, with a maximum weight perfect matching formulation (M^* = arg max_M sum_{(i,j) in M} w(i,j)) solved on a pruned graph using the Blossom algorithm. The method balances watermark strength and image quality through confidence-guided index replacement (relative-conf_k = log(P(Idx_k)/P(Idx_k'))) and random red/green assignment, while verification relies on a Central Limit Theorem-based confidence interval and index reconstruction. Empirically, IndexMark achieves state-of-the-art image fidelity and verification accuracy across text- and class-conditioned autoregressive generation at multiple resolutions, demonstrating robustness to blur, noise, JPEG, color jitter, erasing, and cropping. The work provides a practical, scalable mechanism for image tracing and copyright protection in autoregressive, codebook-based generation, without requiring model fine-tuning.

Abstract

Invisible image watermarking can protect image ownership and prevent malicious misuse of visual generative models. However, existing generative watermarking methods are mainly designed for diffusion models while watermarking for autoregressive image generation models remains largely underexplored. We propose IndexMark, a training-free watermarking framework for autoregressive image generation models. IndexMark is inspired by the redundancy property of the codebook: replacing autoregressively generated indices with similar indices produces negligible visual differences. The core component in IndexMark is a simple yet effective match-then-replace method, which carefully selects watermark tokens from the codebook based on token similarity, and promotes the use of watermark tokens through token replacement, thereby embedding the watermark without affecting the image quality. Watermark verification is achieved by calculating the proportion of watermark tokens in generated images, with precision further improved by an Index Encoder. Furthermore, we introduce an auxiliary validation scheme to enhance robustness against cropping attacks. Experiments demonstrate that IndexMark achieves state-of-the-art performance in terms of image quality and verification accuracy, and exhibits robustness against various perturbations, including cropping, noises, Gaussian blur, random erasing, color jittering, and JPEG compression.

Training-Free Watermarking for Autoregressive Image Generation

TL;DR

IndexMark introduces a training-free watermarking approach for autoregressive image generation by exploiting codebook redundancy through red–green index pairs and a match-then-replace embedding strategy. Watermark presence is verified via the green-index rate, enhanced by an Index Encoder and a cropping-robust verification protocol, with a maximum weight perfect matching formulation (M^* = arg max_M sum_{(i,j) in M} w(i,j)) solved on a pruned graph using the Blossom algorithm. The method balances watermark strength and image quality through confidence-guided index replacement (relative-conf_k = log(P(Idx_k)/P(Idx_k'))) and random red/green assignment, while verification relies on a Central Limit Theorem-based confidence interval and index reconstruction. Empirically, IndexMark achieves state-of-the-art image fidelity and verification accuracy across text- and class-conditioned autoregressive generation at multiple resolutions, demonstrating robustness to blur, noise, JPEG, color jitter, erasing, and cropping. The work provides a practical, scalable mechanism for image tracing and copyright protection in autoregressive, codebook-based generation, without requiring model fine-tuning.

Abstract

Invisible image watermarking can protect image ownership and prevent malicious misuse of visual generative models. However, existing generative watermarking methods are mainly designed for diffusion models while watermarking for autoregressive image generation models remains largely underexplored. We propose IndexMark, a training-free watermarking framework for autoregressive image generation models. IndexMark is inspired by the redundancy property of the codebook: replacing autoregressively generated indices with similar indices produces negligible visual differences. The core component in IndexMark is a simple yet effective match-then-replace method, which carefully selects watermark tokens from the codebook based on token similarity, and promotes the use of watermark tokens through token replacement, thereby embedding the watermark without affecting the image quality. Watermark verification is achieved by calculating the proportion of watermark tokens in generated images, with precision further improved by an Index Encoder. Furthermore, we introduce an auxiliary validation scheme to enhance robustness against cropping attacks. Experiments demonstrate that IndexMark achieves state-of-the-art performance in terms of image quality and verification accuracy, and exhibits robustness against various perturbations, including cropping, noises, Gaussian blur, random erasing, color jittering, and JPEG compression.

Paper Structure

This paper contains 46 sections, 11 equations, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: Watermark embedding by index replacement to attain a higher proportion of watermark tokens (green index).
  • Figure 2: Watermark embedding and verification of IndexMark. During autoregressive index generation, IndexMark selectively replaces red indices with green indices from the same index pair based on confidence to embed the watermark. The watermarked image is fed into the Index Encoder to calculate the green index rate for watermark verification.
  • Figure 3: Index pair distribution of one hundred generated images.
  • Figure 4: Training of Index Encoder. The Encoder, Codebook, and Decoder are frozen while the Index Encoder is updated to achieve accurate index reconstruction.
  • Figure 5: ROBIN vs. IndexMark. ROBIN embeds watermarks during the intermediate diffusion state, which may lead to changes in the image content. In contrast, IndexMark uses the match-then-replace strategy to embed watermarks, effectively preserving the image's quality and content.
  • ...and 7 more figures