Table of Contents
Fetching ...

Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking

Zhiyuan Ma, Guoli Jia, Biqing Qi, Bowen Zhou

TL;DR

Safe-SD addresses the need for traceable AI-generated content by integrating invisible watermarks into the diffusion process rather than post-processing. The method combines a unified injector/extractor with a latent diffuser fine-tuned via a $\lambda$-sampling and $\lambda$-encryption scheme and a text-prompt trigger to enable adaptive watermarking. The work demonstrates state-of-the-art quantitative performance on LSUN-Churches and FFHQ and shows resilience to common image attacks, while enabling multi-watermarking and easy extension to other diffusion models. The approach promises practical impact for copyright protection, content monitoring, and attribution in AIGC pipelines.

Abstract

Recently, stable diffusion (SD) models have typically flourished in the field of image synthesis and personalized editing, with a range of photorealistic and unprecedented images being successfully generated. As a result, widespread interest has been ignited to develop and use various SD-based tools for visual content creation. However, the exposure of AI-created content on public platforms could raise both legal and ethical risks. In this regard, the traditional methods of adding watermarks to the already generated images (i.e. post-processing) may face a dilemma (e.g., being erased or modified) in terms of copyright protection and content monitoring, since the powerful image inversion and text-to-image editing techniques have been widely explored in SD-based methods. In this work, we propose a Safe and high-traceable Stable Diffusion framework (namely Safe-SD) to adaptively implant the graphical watermarks (e.g., QR code) into the imperceptible structure-related pixels during the generative diffusion process for supporting text-driven invisible watermarking and detection. Different from the previous high-cost injection-then-detection training framework, we design a simple and unified architecture, which makes it possible to simultaneously train watermark injection and detection in a single network, greatly improving the efficiency and convenience of use. Moreover, to further support text-driven generative watermarking and deeply explore its robustness and high-traceability, we elaborately design lambda sampling and encryption algorithm to fine-tune a latent diffuser wrapped by a VAE for balancing high-fidelity image synthesis and high-traceable watermark detection. We present our quantitative and qualitative results on two representative datasets LSUN, COCO and FFHQ, demonstrating state-of-the-art performance of Safe-SD and showing it significantly outperforms the previous approaches.

Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking

TL;DR

Safe-SD addresses the need for traceable AI-generated content by integrating invisible watermarks into the diffusion process rather than post-processing. The method combines a unified injector/extractor with a latent diffuser fine-tuned via a -sampling and -encryption scheme and a text-prompt trigger to enable adaptive watermarking. The work demonstrates state-of-the-art quantitative performance on LSUN-Churches and FFHQ and shows resilience to common image attacks, while enabling multi-watermarking and easy extension to other diffusion models. The approach promises practical impact for copyright protection, content monitoring, and attribution in AIGC pipelines.

Abstract

Recently, stable diffusion (SD) models have typically flourished in the field of image synthesis and personalized editing, with a range of photorealistic and unprecedented images being successfully generated. As a result, widespread interest has been ignited to develop and use various SD-based tools for visual content creation. However, the exposure of AI-created content on public platforms could raise both legal and ethical risks. In this regard, the traditional methods of adding watermarks to the already generated images (i.e. post-processing) may face a dilemma (e.g., being erased or modified) in terms of copyright protection and content monitoring, since the powerful image inversion and text-to-image editing techniques have been widely explored in SD-based methods. In this work, we propose a Safe and high-traceable Stable Diffusion framework (namely Safe-SD) to adaptively implant the graphical watermarks (e.g., QR code) into the imperceptible structure-related pixels during the generative diffusion process for supporting text-driven invisible watermarking and detection. Different from the previous high-cost injection-then-detection training framework, we design a simple and unified architecture, which makes it possible to simultaneously train watermark injection and detection in a single network, greatly improving the efficiency and convenience of use. Moreover, to further support text-driven generative watermarking and deeply explore its robustness and high-traceability, we elaborately design lambda sampling and encryption algorithm to fine-tune a latent diffuser wrapped by a VAE for balancing high-fidelity image synthesis and high-traceable watermark detection. We present our quantitative and qualitative results on two representative datasets LSUN, COCO and FFHQ, demonstrating state-of-the-art performance of Safe-SD and showing it significantly outperforms the previous approaches.
Paper Structure (12 sections, 11 equations, 9 figures, 3 tables, 2 algorithms)

This paper contains 12 sections, 11 equations, 9 figures, 3 tables, 2 algorithms.

Figures (9)

  • Figure 1: The overview of our proposed Safe-SD framework. In which, different humans indicate the different roles being simulated in the AIGC environment such as user, originator, developer, hacker and monitor.
  • Figure 2: The framework of Safe-SD model.
  • Figure 3: The forward diffusion with $\lambda$-sampling watermarking.
  • Figure 4: The inverted denoising based $\lambda$-encryption prediction.
  • Figure 5: Evaluation the image quality by visualizing the pixel-level differences (×10) between original image and watermarked image (marked as W/. Watermark). Top: natural images from COCO lin2014microsoft. Mid: facial images from FFHQ karras2019style. Bottom: text-generated images.
  • ...and 4 more figures