JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits
Minzhou Pan, Yi Zeng, Xue Lin, Ning Yu, Cho-Jui Hsieh, Peter Henderson, Ruoxi Jia
TL;DR
This work tackles the vulnerability of image watermarks to diffusion-model edits by introducing JigMark, a black-box watermarking framework that learns robust watermarks through contrastive training on original and diffusion-perturbed image pairs without gradient propagation through the perturbations. A novel Jigsaw-based embedding enables per-holder keys and rapid deployment, while the HAV metric provides a human-aligned measure of diffusion strength and perturbation impact. JigMark consistently outperforms traditional and diffusion-integrated baselines in watermark detectability under diffusion edits and conventional perturbations, and maintains perceptual image quality. The approach offers practical resilience for IP protection in the era of accessible diffusion-based image editing, supported by extensive evaluations and an open-source codebase, albeit with significant initial training overhead and some limitations against extreme transformations.
Abstract
In this study, we investigate the vulnerability of image watermarks to diffusion-model-based image editing, a challenge exacerbated by the computational cost of accessing gradient information and the closed-source nature of many diffusion models. To address this issue, we introduce JIGMARK. This first-of-its-kind watermarking technique enhances robustness through contrastive learning with pairs of images, processed and unprocessed by diffusion models, without needing a direct backpropagation of the diffusion process. Our evaluation reveals that JIGMARK significantly surpasses existing watermarking solutions in resilience to diffusion-model edits, demonstrating a True Positive Rate more than triple that of leading baselines at a 1% False Positive Rate while preserving image quality. At the same time, it consistently improves the robustness against other conventional perturbations (like JPEG, blurring, etc.) and malicious watermark attacks over the state-of-the-art, often by a large margin. Furthermore, we propose the Human Aligned Variation (HAV) score, a new metric that surpasses traditional similarity measures in quantifying the number of image derivatives from image editing.
