Table of Contents
Fetching ...

Towards Effective User Attribution for Latent Diffusion Models via Watermark-Informed Blending

Yongyang Pan, Xiaohong Liu, Siqi Luo, Yi Xin, Xiao Guo, Xiaoming Liu, Xiongkuo Min, Guangtao Zhai

TL;DR

TEAWIB tackles unauthorized use of latent diffusion models by embedding user-specific watermarks directly into the decoder through a watermark-informed blending approach that requires no retraining and preserves high image fidelity. It introduces Dynamic Watermark Blending (DWB) and Image Quality Preservation (IQP) to achieve robust, invisible watermarks, supported by a watermark extraction loss and perceptual loss to maintain perceptual similarity. Comprehensive MS-COCO experiments show state-of-the-art image quality (PSNR ≈ 39.2 dB, SSIM ≈ 0.985, low LPIPS) and near-perfect watermark detectability (≈99% with extremely low FPR) even under post-processing and large-scale identification scenarios. The framework also demonstrates resilience to deliberate watermark removal attempts and supports scalable attribution for large user populations via a Ready-to-Use configuration, with limitations currently limited to text-to-image generation and planned extensions to other modalities.

Abstract

Rapid advancements in multimodal large language models have enabled the creation of hyper-realistic images from textual descriptions. However, these advancements also raise significant concerns about unauthorized use, which hinders their broader distribution. Traditional watermarking methods often require complex integration or degrade image quality. To address these challenges, we introduce a novel framework Towards Effective user Attribution for latent diffusion models via Watermark-Informed Blending (TEAWIB). TEAWIB incorporates a unique ready-to-use configuration approach that allows seamless integration of user-specific watermarks into generative models. This approach ensures that each user can directly apply a pre-configured set of parameters to the model without altering the original model parameters or compromising image quality. Additionally, noise and augmentation operations are embedded at the pixel level to further secure and stabilize watermarked images. Extensive experiments validate the effectiveness of TEAWIB, showcasing the state-of-the-art performance in perceptual quality and attribution accuracy.

Towards Effective User Attribution for Latent Diffusion Models via Watermark-Informed Blending

TL;DR

TEAWIB tackles unauthorized use of latent diffusion models by embedding user-specific watermarks directly into the decoder through a watermark-informed blending approach that requires no retraining and preserves high image fidelity. It introduces Dynamic Watermark Blending (DWB) and Image Quality Preservation (IQP) to achieve robust, invisible watermarks, supported by a watermark extraction loss and perceptual loss to maintain perceptual similarity. Comprehensive MS-COCO experiments show state-of-the-art image quality (PSNR ≈ 39.2 dB, SSIM ≈ 0.985, low LPIPS) and near-perfect watermark detectability (≈99% with extremely low FPR) even under post-processing and large-scale identification scenarios. The framework also demonstrates resilience to deliberate watermark removal attempts and supports scalable attribution for large user populations via a Ready-to-Use configuration, with limitations currently limited to text-to-image generation and planned extensions to other modalities.

Abstract

Rapid advancements in multimodal large language models have enabled the creation of hyper-realistic images from textual descriptions. However, these advancements also raise significant concerns about unauthorized use, which hinders their broader distribution. Traditional watermarking methods often require complex integration or degrade image quality. To address these challenges, we introduce a novel framework Towards Effective user Attribution for latent diffusion models via Watermark-Informed Blending (TEAWIB). TEAWIB incorporates a unique ready-to-use configuration approach that allows seamless integration of user-specific watermarks into generative models. This approach ensures that each user can directly apply a pre-configured set of parameters to the model without altering the original model parameters or compromising image quality. Additionally, noise and augmentation operations are embedded at the pixel level to further secure and stabilize watermarked images. Extensive experiments validate the effectiveness of TEAWIB, showcasing the state-of-the-art performance in perceptual quality and attribution accuracy.
Paper Structure (30 sections, 12 equations, 7 figures, 3 tables)

This paper contains 30 sections, 12 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Workflow of TEAWIB. During inference, the fingerprinted model (i.e., $\mathcal{D}$) is employed by the model user to generate images embedded with an invisible watermark. The model owner can then utilize the watermark decoder to identify watermarks within questionable images and trace their origins back to the respective model users.
  • Figure 2: Overview of TEAWIB. The workflow is bifurcated into two primary phases: (1) model training and (2) model distribution. During the training phase, the watermark decoder and the diffusion decoder are concurrently trained using a variety of randomly generated watermarks. The model distribution phase consists of two stages. Stage 1: The model user (e.g., John) requests access from the model owner, who assigns a unique watermark for the user and registers it in the database. Stage 2: For each authorized model user (e.g., Sam), the model owner selects a user-specific watermark from the database and integrates it with the generic decoder using the WIB method, distributing the watermarked decoder to the user. Model users can then employ these text-to-image models for image generation, with embedded invisible watermarks. For verification, the model owner can decode the watermark from any misused image and match it with the database to identify the specific user.
  • Figure 3: Qualitative comparison of TEAWIB with other ad-hoc watermark generation techniques on the MS-COCO validation set. Notably, our method preserves the high quality of the generated image and invisible watermark embedding.
  • Figure 4: Model detection.
  • Figure 5: Model identification.
  • ...and 2 more figures