Watermarking Diffusion Model
Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, Yang Zhang
TL;DR
This work tackles the IP protection gap for diffusion models by introducing two watermarking schemes, NaiveWM and FixedWM, that embed verifiable triggers into pre-trained latent diffusion models with minimal degradation to performance. NaiveWM provides a simple trigger-based fine-tuning approach, while FixedWM adds stealth by requiring triggers at a fixed prompt position. A comprehensive evaluation on MS COCO and ablation studies on poisoning ratio and trigger length demonstrate that the watermarks are detectable and the LDM utility is largely preserved, offering a practical mechanism for ownership verification. The study highlights practical trade-offs and discusses limitations, including computational cost and potential vulnerability to watermark erasure techniques, underscoring the need for IP protection in DM-based applications.
Abstract
The availability and accessibility of diffusion models (DMs) have significantly increased in recent years, making them a popular tool for analyzing and predicting the spread of information, behaviors, or phenomena through a population. Particularly, text-to-image diffusion models (e.g., DALLE 2 and Latent Diffusion Models (LDMs) have gained significant attention in recent years for their ability to generate high-quality images and perform various image synthesis tasks. Despite their widespread adoption in many fields, DMs are often susceptible to various intellectual property violations. These can include not only copyright infringement but also more subtle forms of misappropriation, such as unauthorized use or modification of the model. Therefore, DM owners must be aware of these potential risks and take appropriate steps to protect their models. In this work, we are the first to protect the intellectual property of DMs. We propose a simple but effective watermarking scheme that injects the watermark into the DMs and can be verified by the pre-defined prompts. In particular, we propose two different watermarking methods, namely NAIVEWM and FIXEDWM. The NAIVEWM method injects the watermark into the LDMs and activates it using a prompt containing the watermark. On the other hand, the FIXEDWM is considered more advanced and stealthy compared to the NAIVEWM, as it can only activate the watermark when using a prompt containing a trigger in a fixed position. We conducted a rigorous evaluation of both approaches, demonstrating their effectiveness in watermark injection and verification with minimal impact on the LDM's functionality.
