AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
Weitao Feng, Wenbo Zhou, Jiyan He, Jie Zhang, Tianyi Wei, Guanlin Li, Tianwei Zhang, Weiming Zhang, Nenghai Yu
TL;DR
AquaLoRA addresses white-box protection for customized Stable Diffusion models by embedding a watermark directly into the U-Net via a Watermark LoRA. The method uses a two-stage pipeline: latent watermark pre-training to create a latent codebook and prior-preserving fine-tuning to integrate the watermark with minimal distribution shift, aided by a Scaling Matrix that encodes secret bits. It demonstrates high watermark fidelity and robustness across distortions, sampling configurations, and model variants, with coarse-type adaption further improving coverage across diverse checkpoints. This work enables practical copyright protection and model authentication in shared checkpoints and public platforms, while highlighting limitations and directions for future strengthening of white-box security in diffusion models.
Abstract
Diffusion models have achieved remarkable success in generating high-quality images. Recently, the open-source models represented by Stable Diffusion (SD) are thriving and are accessible for customization, giving rise to a vibrant community of creators and enthusiasts. However, the widespread availability of customized SD models has led to copyright concerns, like unauthorized model distribution and unconsented commercial use. To address it, recent works aim to let SD models output watermarked content for post-hoc forensics. Unfortunately, none of them can achieve the challenging white-box protection, wherein the malicious user can easily remove or replace the watermarking module to fail the subsequent verification. For this, we propose \texttt{\method} as the first implementation under this scenario. Briefly, we merge watermark information into the U-Net of Stable Diffusion Models via a watermark Low-Rank Adaptation (LoRA) module in a two-stage manner. For watermark LoRA module, we devise a scaling matrix to achieve flexible message updates without retraining. To guarantee fidelity, we design Prior Preserving Fine-Tuning (PPFT) to ensure watermark learning with minimal impacts on model distribution, validated by proofs. Finally, we conduct extensive experiments and ablation studies to verify our design.
