SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models
Zilan Wang, Junfeng Guo, Jiacheng Zhu, Yiming Li, Heng Huang, Muhao Chen, Zhengzhong Tu
TL;DR
SleeperMark tackles the problem of protecting IP in pre-trained text-to-image diffusion models under black-box verification by introducing a backdoor-style watermark that is disentangled from semantic knowledge. It uses latent-space watermark pre-training to produce a secret latent residual, then injects a message-embedding backdoor during diffusion fine-tuning via triggered prompts, ensuring watermark extraction remains reliable after downstream tasks. The method extends to pixel diffusion models with a distortion layer and adversarial loss to maintain stealth while preserving fidelity. Extensive experiments show SleeperMark outperforms baselines in fidelity, robustness to fine-tuning (e.g., LoRA, DreamBooth, ControlNet), and stealth, offering a practical tool for black-box ownership verification in real-world scenarios where models are adapted post-release.
Abstract
Recent advances in large-scale text-to-image (T2I) diffusion models have enabled a variety of downstream applications, including style customization, subject-driven personalization, and conditional generation. As T2I models require extensive data and computational resources for training, they constitute highly valued intellectual property (IP) for their legitimate owners, yet making them incentive targets for unauthorized fine-tuning by adversaries seeking to leverage these models for customized, usually profitable applications. Existing IP protection methods for diffusion models generally involve embedding watermark patterns and then verifying ownership through generated outputs examination, or inspecting the model's feature space. However, these techniques are inherently ineffective in practical scenarios when the watermarked model undergoes fine-tuning, and the feature space is inaccessible during verification ((i.e., black-box setting). The model is prone to forgetting the previously learned watermark knowledge when it adapts to a new task. To address this challenge, we propose SleeperMark, a novel framework designed to embed resilient watermarks into T2I diffusion models. SleeperMark explicitly guides the model to disentangle the watermark information from the semantic concepts it learns, allowing the model to retain the embedded watermark while continuing to be adapted to new downstream tasks. Our extensive experiments demonstrate the effectiveness of SleeperMark across various types of diffusion models, including latent diffusion models (e.g., Stable Diffusion) and pixel diffusion models (e.g., DeepFloyd-IF), showing robustness against downstream fine-tuning and various attacks at both the image and model levels, with minimal impact on the model's generative capability. The code is available at https://github.com/taco-group/SleeperMark.
