PT-Mark: Invisible Watermarking for Text-to-image Diffusion Models via Semantic-aware Pivotal Tuning
Yaopeng Wang, Huiyu Xu, Zhibo Wang, Jiacheng Du, Zhichao Li, Yiming Li, Qiu Wang, Kui Ren
TL;DR
PT-Mark addresses semantic drift in watermarking text-to-image diffusion outputs by explicitly steering the entire diffusion denoising process using a semantic-aware pivotal tuning strategy. It reconstructs both the original and watermarked trajectories, identifies watermark-relevant regions with a segmentation model, and optimizes an adjustable null-text embedding to disentangle semantics from watermark patterns while preserving the watermark. Empirical results on MS-COCO and DiffusionDB show about a 10% gain in semantic fidelity metrics and high watermark verification accuracy (AUC ~ 0.99), along with a roughly 4x efficiency improvement over prior methods, while maintaining robustness to common perturbations. The approach acts as a plug-and-play module compatible with existing diffusion-based watermarking techniques, enabling invisible watermarking that preserves semantics in practical digital-art pipelines.
Abstract
Watermarking for diffusion images has drawn considerable attention due to the widespread use of text-to-image diffusion models and the increasing need for their copyright protection. Recently, advanced watermarking techniques, such as Tree Ring, integrate watermarks by embedding traceable patterns (e.g., Rings) into the latent distribution during the diffusion process. Such methods disrupt the original semantics of the generated images due to the inevitable distribution shift caused by the watermarks, thereby limiting their practicality, particularly in digital art creation. In this work, we present Semantic-aware Pivotal Tuning Watermarks (PT-Mark), a novel invisible watermarking method that preserves both the semantics of diffusion images and the traceability of the watermark. PT-Mark preserves the original semantics of the watermarked image by gradually aligning the generation trajectory with the original (pivotal) trajectory while maintaining the traceable watermarks during whole diffusion denoising process. To achieve this, we first compute the salient regions of the watermark at each diffusion denoising step as a spatial prior to identify areas that can be aligned without disrupting the watermark pattern. Guided by the region, we then introduce an additional pivotal tuning branch that optimizes the text embedding to align the semantics while preserving the watermarks. Extensive evaluations demonstrate that PT-Mark can preserve the original semantics of the diffusion images while integrating robust watermarks. It achieves a 10% improvement in the performance of semantic preservation (i.e., SSIM, PSNR, and LPIPS) compared to state-of-the-art watermarking methods, while also showing comparable robustness against real-world perturbations and four times greater efficiency.
