Table of Contents
Fetching ...

Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing

Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Xiaojun Chen, Wu Liu, Weiping Wang

Abstract

Recent advancements in diffusion-based image editing pose a significant threat to the authenticity of digital visual content. Traditional embedding-based watermarking methods often introduce perceptible perturbations to maintain robustness, inevitably compromising visual fidelity. Meanwhile, existing zero-watermarking approaches, typically relying on global image features, struggle to withstand sophisticated manipulations. In this work, we uncover a key observation: while individual image patches undergo substantial alterations during AI-based editing, the relational distance between patch pairs remains relatively invariant. Leveraging this property, we propose Relational Zero-Watermarking (Rel-Zero), a novel framework that requires no modification to the original image but derives a unique zero-watermark from these editing-invariant patch relations. By grounding the watermark in intrinsic structural consistency rather than absolute appearance, Rel-Zero provides a non-invasive yet resilient mechanism for content authentication. Extensive experiments demonstrate that Rel-Zero achieves substantially improved robustness across diverse editing models and manipulations compared to prior zero-watermarking approaches.

Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing

Abstract

Recent advancements in diffusion-based image editing pose a significant threat to the authenticity of digital visual content. Traditional embedding-based watermarking methods often introduce perceptible perturbations to maintain robustness, inevitably compromising visual fidelity. Meanwhile, existing zero-watermarking approaches, typically relying on global image features, struggle to withstand sophisticated manipulations. In this work, we uncover a key observation: while individual image patches undergo substantial alterations during AI-based editing, the relational distance between patch pairs remains relatively invariant. Leveraging this property, we propose Relational Zero-Watermarking (Rel-Zero), a novel framework that requires no modification to the original image but derives a unique zero-watermark from these editing-invariant patch relations. By grounding the watermark in intrinsic structural consistency rather than absolute appearance, Rel-Zero provides a non-invasive yet resilient mechanism for content authentication. Extensive experiments demonstrate that Rel-Zero achieves substantially improved robustness across diverse editing models and manipulations compared to prior zero-watermarking approaches.
Paper Structure (31 sections, 18 equations, 10 figures, 4 tables)

This paper contains 31 sections, 18 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Analysis of Relational Stability. We reveal a key insight: Patch-pair distance tends to be preserved after AI editing. This invariant property can be extracted as a zero-watermark. Here we present the patch-pair distance of RGB vectors in original images (first row) and edited images (second row) across five main editing models. The last row illustrates the distance difference between pre-edit and post-edit with a color scale spanning [-0.2,0.2].
  • Figure 1: PCA visualization of ViT patch embeddings before and after generative editing or VAE reconstruction. Both transformations cause similar displacement patterns: only patches in drastically semantically edited regions move noticeably, while most patches remain stable. The similarity of these trajectories demonstrates that VAE reconstruction effectively mimics edit-induced feature changes.
  • Figure 2: Analysis of Relational Stability.
  • Figure 2: Correlation between patch–pair ViT features distance before and after editing. Distance aligns strongly with a fitted linear model $M_{\text{after}} \approx \alpha\, M_{\text{before}}$, confirming that ViT patch–pair relationships remain stable and support our relational watermark extraction.
  • Figure 3: Framework of the proposed Rel-Zero.(a) Stable Patch Pair Identification. To train a predictor capable of identifying patch pairs with invariant distance relationships, we first construct training targets. A pretrained VAE is employed to simulate generative edits. Features of the original and VAE-modified images are extracted using a ViT to obtain patch-wise features $\mathcal{F}$ and $\mathcal{\hat{F}}$. Pairwise distance is computed on both feature maps, and their differences are measured to identify the most stable top-$K$ pairs surviving edits, which serve as the ground-truth pairs $\mathcal{E}_g$. (b) Patch Relational Learning. Given an input image, ViT features are extracted and all patch pairs are densely formed to construct fully-connected pairs $\mathcal{E}$. A learnable pair predictor $\Phi$ then estimates the stability scores of each pair, which are trained to align with the ground-truth pairs $\mathcal{E}_g$ from (a). During inference (i.e., watermark generation and verification), only the module in (b) is required to generate relational zero-watermarks $\mathcal{E}_p$.
  • ...and 5 more figures