Table of Contents
Fetching ...

ShapeMark: Robust and Diversity-Preserving Watermarking for Diffusion Models

Yuqi Qian, Yun Cao, Haocheng Fu, Meiyang Lv, Meineng Zhu

TL;DR

A dedicated randomization design is introduced that reshuffles the positions of noise elements without changing their values, preventing the watermark from inducing fixed noise patterns or spatial locations and achieving state-of-the-art robustness while maintaining high generation quality across a wide range of lossy scenarios.

Abstract

Diffusion models have made substantial advances in recent years, enabling high-quality image synthesis; however, the widespread dissemination and reuse of their outputs have introduced new challenges in intellectual property protection and content provenance. Image watermarking offers a solution to these challenges, and recent work has increasingly explored Noise-as-Watermark (NaW) approaches that integrate watermarking directly into the diffusion process. However, existing NaW methods fail to balance robustness and diversity. We attribute this weakness to value encoding, which encodes watermark bits into individual sampled values. It is extremely fragile in practical application scenarios. To address this, we encode watermark bits into the structured noise pattern, so that the watermark is preserved even when individual values are perturbed. To further ensure generation diversity, we introduce a dedicated randomization design that reshuffles the positions of noise elements without changing their values, preventing the watermark from inducing fixed noise patterns or spatial locations. Extensive experiments demonstrate that our method achieves state-of-the-art robustness while maintaining high generation quality across a wide range of lossy scenarios.

ShapeMark: Robust and Diversity-Preserving Watermarking for Diffusion Models

TL;DR

A dedicated randomization design is introduced that reshuffles the positions of noise elements without changing their values, preventing the watermark from inducing fixed noise patterns or spatial locations and achieving state-of-the-art robustness while maintaining high generation quality across a wide range of lossy scenarios.

Abstract

Diffusion models have made substantial advances in recent years, enabling high-quality image synthesis; however, the widespread dissemination and reuse of their outputs have introduced new challenges in intellectual property protection and content provenance. Image watermarking offers a solution to these challenges, and recent work has increasingly explored Noise-as-Watermark (NaW) approaches that integrate watermarking directly into the diffusion process. However, existing NaW methods fail to balance robustness and diversity. We attribute this weakness to value encoding, which encodes watermark bits into individual sampled values. It is extremely fragile in practical application scenarios. To address this, we encode watermark bits into the structured noise pattern, so that the watermark is preserved even when individual values are perturbed. To further ensure generation diversity, we introduce a dedicated randomization design that reshuffles the positions of noise elements without changing their values, preventing the watermark from inducing fixed noise patterns or spatial locations. Extensive experiments demonstrate that our method achieves state-of-the-art robustness while maintaining high generation quality across a wide range of lossy scenarios.
Paper Structure (43 sections, 29 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 43 sections, 29 equations, 5 figures, 5 tables, 2 algorithms.

Figures (5)

  • Figure 1: Comparison between existing Noise-as-Watermark methods and the proposed method.
  • Figure 2: Overall pipeline of ShapeMark. The figure illustrates structural-level watermark encoding in the diffusion noise latent and structure-based watermark decoding via diffusion inversion and reference codebook matching. For clarity of visualization, each block is shown with a block size of one (i.e., a single noise element per block), whereas the actual method supports larger block sizes in practice.
  • Figure 3: The watermarked image is attacked by different distortions. (a) Watermarked image. (b) JPEG, QF = 25. (c) 80% area Random Drop. (d) 60% area Random Crop. (e) 25% Resize and restore. (f) Gaussian Blur, $r$ = 4. (g) Median Blur, $k$ = 7. (h) Gaussian Noise, $\mu$ = 0, $\sigma$ = 0.05. (i) Salt and Pepper Noise, $p$ = 0.05. (j) Brightness, factor = 6.
  • Figure 4: Ablation Studies.
  • Figure 5: Qualitative examples of watermarked images generated from diverse text prompts. The images are generated using the following prompts: (1) “A close-up photo of a golden retriever dog sitting on grass, natural light, shallow depth of field”; (2) “A bowl of fresh fruits on a wooden table, photorealistic, studio lighting”; (3) “A head-and-shoulders portrait of a man wearing glasses, neutral expression, studio lighting”; (4) “A wide-angle photo of a mountain landscape at sunrise, dramatic sky, high detail”; and (5) “A watercolor illustration of flowers in a garden, soft colors”. These prompts cover diverse semantic categories and visual styles, illustrating that our method preserves high perceptual quality while embedding watermarks across heterogeneous generation scenarios.