Table of Contents
Fetching ...

WMAdapter: Adding WaterMark Control to Latent Diffusion Models

Hai Ci, Yiren Song, Pei Yang, Jinheng Xie, Mike Zheng Shou

TL;DR

WMAdapter presents a plug-and-play watermarking solution for latent diffusion models by introducing a lightweight contextual adapter that can imprint arbitrary watermark bits during generation without per-watermark finetuning. It leverages a two-stage training regime, including a novel hybrid finetuning that jointly tunes the adapter and a fixed VAE to suppress tiny artifacts while preserving sharpness. Empirical results show competitive bit accuracy and near-perfect tracing across large user pools, with superior image quality (PSNR/SSIM) and competitive robustness compared to post-hoc and diffusion-native baselines. The approach enables scalable, high-fidelity watermarking with potential zero-shot transfer across different VAEs and diffusion variants, albeit with some artifacts in certain finetuning settings that warrant further refinement.

Abstract

Watermarking is crucial for protecting the copyright of AI-generated images. We propose WMAdapter, a diffusion model watermark plugin that takes user-specified watermark information and allows for seamless watermark imprinting during the diffusion generation process. WMAdapter is efficient and robust, with a strong emphasis on high generation quality. To achieve this, we make two key designs: (1) We develop a contextual adapter structure that is lightweight and enables effective knowledge transfer from heavily pretrained post-hoc watermarking models. (2) We introduce an extra finetuning step and design a hybrid finetuning strategy to further improve image quality and eliminate tiny artifacts. Empirical results demonstrate that WMAdapter offers strong flexibility, exceptional image generation quality and competitive watermark robustness.

WMAdapter: Adding WaterMark Control to Latent Diffusion Models

TL;DR

WMAdapter presents a plug-and-play watermarking solution for latent diffusion models by introducing a lightweight contextual adapter that can imprint arbitrary watermark bits during generation without per-watermark finetuning. It leverages a two-stage training regime, including a novel hybrid finetuning that jointly tunes the adapter and a fixed VAE to suppress tiny artifacts while preserving sharpness. Empirical results show competitive bit accuracy and near-perfect tracing across large user pools, with superior image quality (PSNR/SSIM) and competitive robustness compared to post-hoc and diffusion-native baselines. The approach enables scalable, high-fidelity watermarking with potential zero-shot transfer across different VAEs and diffusion variants, albeit with some artifacts in certain finetuning settings that warrant further refinement.

Abstract

Watermarking is crucial for protecting the copyright of AI-generated images. We propose WMAdapter, a diffusion model watermark plugin that takes user-specified watermark information and allows for seamless watermark imprinting during the diffusion generation process. WMAdapter is efficient and robust, with a strong emphasis on high generation quality. To achieve this, we make two key designs: (1) We develop a contextual adapter structure that is lightweight and enables effective knowledge transfer from heavily pretrained post-hoc watermarking models. (2) We introduce an extra finetuning step and design a hybrid finetuning strategy to further improve image quality and eliminate tiny artifacts. Empirical results demonstrate that WMAdapter offers strong flexibility, exceptional image generation quality and competitive watermark robustness.
Paper Structure (35 sections, 2 equations, 14 figures, 6 tables)

This paper contains 35 sections, 2 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Framework overview. WMAdapter is plugged onto the VAE decoder. It takes user input watermark bits and image features from the VAE decoder, imprinting the watermark on-the-fly during VAE decoding. In contrast, traditional non-contextual adapters take only watermark conditions as input. WMAdapter can be trained with a post-hoc watermark decoder for efficient knowledge transfer. The image and icons credit to sdonlineflaticon.
  • Figure 2: The architecture of WMAdapter. Left: The structure of WMAdapter. It comprises several independent Fusers with identical structures. Right: The structure of Fuser. It consists of a watermark Embedding module and a Fusing module.
  • Figure 3: Illustration of 3 different finetunig strategies. They differ in how to treat the VAE decoder.
  • Figure 3: Accuracy of tracing different numbers of keys. All methods are evaluated on COCO dataset lin2014microsoft. For WADIFF$^{\ast}$min2024watermark, we use the number reported by its original paper.
  • Figure 4: WMAdapter-F against auto-encoder watermark removal ballemshj18cheng2020image.
  • ...and 9 more figures