Table of Contents
Fetching ...

AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA

Weitao Feng, Wenbo Zhou, Jiyan He, Jie Zhang, Tianyi Wei, Guanlin Li, Tianwei Zhang, Weiming Zhang, Nenghai Yu

TL;DR

AquaLoRA addresses white-box protection for customized Stable Diffusion models by embedding a watermark directly into the U-Net via a Watermark LoRA. The method uses a two-stage pipeline: latent watermark pre-training to create a latent codebook and prior-preserving fine-tuning to integrate the watermark with minimal distribution shift, aided by a Scaling Matrix that encodes secret bits. It demonstrates high watermark fidelity and robustness across distortions, sampling configurations, and model variants, with coarse-type adaption further improving coverage across diverse checkpoints. This work enables practical copyright protection and model authentication in shared checkpoints and public platforms, while highlighting limitations and directions for future strengthening of white-box security in diffusion models.

Abstract

Diffusion models have achieved remarkable success in generating high-quality images. Recently, the open-source models represented by Stable Diffusion (SD) are thriving and are accessible for customization, giving rise to a vibrant community of creators and enthusiasts. However, the widespread availability of customized SD models has led to copyright concerns, like unauthorized model distribution and unconsented commercial use. To address it, recent works aim to let SD models output watermarked content for post-hoc forensics. Unfortunately, none of them can achieve the challenging white-box protection, wherein the malicious user can easily remove or replace the watermarking module to fail the subsequent verification. For this, we propose \texttt{\method} as the first implementation under this scenario. Briefly, we merge watermark information into the U-Net of Stable Diffusion Models via a watermark Low-Rank Adaptation (LoRA) module in a two-stage manner. For watermark LoRA module, we devise a scaling matrix to achieve flexible message updates without retraining. To guarantee fidelity, we design Prior Preserving Fine-Tuning (PPFT) to ensure watermark learning with minimal impacts on model distribution, validated by proofs. Finally, we conduct extensive experiments and ablation studies to verify our design.

AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA

TL;DR

AquaLoRA addresses white-box protection for customized Stable Diffusion models by embedding a watermark directly into the U-Net via a Watermark LoRA. The method uses a two-stage pipeline: latent watermark pre-training to create a latent codebook and prior-preserving fine-tuning to integrate the watermark with minimal distribution shift, aided by a Scaling Matrix that encodes secret bits. It demonstrates high watermark fidelity and robustness across distortions, sampling configurations, and model variants, with coarse-type adaption further improving coverage across diverse checkpoints. This work enables practical copyright protection and model authentication in shared checkpoints and public platforms, while highlighting limitations and directions for future strengthening of white-box security in diffusion models.

Abstract

Diffusion models have achieved remarkable success in generating high-quality images. Recently, the open-source models represented by Stable Diffusion (SD) are thriving and are accessible for customization, giving rise to a vibrant community of creators and enthusiasts. However, the widespread availability of customized SD models has led to copyright concerns, like unauthorized model distribution and unconsented commercial use. To address it, recent works aim to let SD models output watermarked content for post-hoc forensics. Unfortunately, none of them can achieve the challenging white-box protection, wherein the malicious user can easily remove or replace the watermarking module to fail the subsequent verification. For this, we propose \texttt{\method} as the first implementation under this scenario. Briefly, we merge watermark information into the U-Net of Stable Diffusion Models via a watermark Low-Rank Adaptation (LoRA) module in a two-stage manner. For watermark LoRA module, we devise a scaling matrix to achieve flexible message updates without retraining. To guarantee fidelity, we design Prior Preserving Fine-Tuning (PPFT) to ensure watermark learning with minimal impacts on model distribution, validated by proofs. Finally, we conduct extensive experiments and ablation studies to verify our design.
Paper Structure (45 sections, 17 equations, 12 figures, 13 tables, 1 algorithm)

This paper contains 45 sections, 17 equations, 12 figures, 13 tables, 1 algorithm.

Figures (12)

  • Figure 1: Illistration of different watermark placement with the Stable Diffusion model. Our watermark is embedded within the core structure of the diffusion model, the U-Net.
  • Figure 2: The overall framework of our method. (a) The first stage is latent watermark pre-training. In this phase, we jointly train a watermark secret encoder $E_s$ and decoder $D_s$ at the latent level. (b) After latent watermark pre-training, we employ our proposed prior preserving fine-tuning (PPFT) strategy to train AquaLoRA, which can be merged into any fine-tuned model weights, offering protection. Coarse type adaptation is omitted here, as it follows the same PPFT strategy.
  • Figure 3: A comparison between the Tree-ring watermark and our proposed AquaLoRA. The image is generated under the same diffusion configurations and the same random seed.
  • Figure 4: Ablation Study on Coarse Type Adaption. Left: Results for watermarked models without coarse-type adaption. Right: Results post fine-tuning on various coarse types.
  • Figure 5: Representative visual examples of Stable Diffusion generated results decoded by different VAE decoders.
  • ...and 7 more figures