Table of Contents
Fetching ...

RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees

Xun Xian, Ganghua Wang, Xuan Bi, Jayanth Srinivasa, Ashish Kundu, Mingyi Hong, Jie Ding

TL;DR

A robust and agile plug-and-play watermark detection framework, dubbed as RAW, which introduces learnable watermarks directly into the original image data and provides provable guarantees regarding the false positive rate for misclassifying a watermarked image, even in the presence of certain adversarial attacks targeting watermark removal.

Abstract

Safeguarding intellectual property and preventing potential misuse of AI-generated images are of paramount importance. This paper introduces a robust and agile plug-and-play watermark detection framework, dubbed as RAW. As a departure from traditional encoder-decoder methods, which incorporate fixed binary codes as watermarks within latent representations, our approach introduces learnable watermarks directly into the original image data. Subsequently, we employ a classifier that is jointly trained with the watermark to detect the presence of the watermark. The proposed framework is compatible with various generative architectures and supports on-the-fly watermark injection after training. By incorporating state-of-the-art smoothing techniques, we show that the framework provides provable guarantees regarding the false positive rate for misclassifying a watermarked image, even in the presence of certain adversarial attacks targeting watermark removal. Experiments on a diverse range of images generated by state-of-the-art diffusion models reveal substantial performance enhancements compared to existing approaches. For instance, our method demonstrates a notable increase in AUROC, from 0.48 to 0.82, when compared to state-of-the-art approaches in detecting watermarked images under adversarial attacks, while maintaining image quality, as indicated by closely aligned FID and CLIP scores.

RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees

TL;DR

A robust and agile plug-and-play watermark detection framework, dubbed as RAW, which introduces learnable watermarks directly into the original image data and provides provable guarantees regarding the false positive rate for misclassifying a watermarked image, even in the presence of certain adversarial attacks targeting watermark removal.

Abstract

Safeguarding intellectual property and preventing potential misuse of AI-generated images are of paramount importance. This paper introduces a robust and agile plug-and-play watermark detection framework, dubbed as RAW. As a departure from traditional encoder-decoder methods, which incorporate fixed binary codes as watermarks within latent representations, our approach introduces learnable watermarks directly into the original image data. Subsequently, we employ a classifier that is jointly trained with the watermark to detect the presence of the watermark. The proposed framework is compatible with various generative architectures and supports on-the-fly watermark injection after training. By incorporating state-of-the-art smoothing techniques, we show that the framework provides provable guarantees regarding the false positive rate for misclassifying a watermarked image, even in the presence of certain adversarial attacks targeting watermark removal. Experiments on a diverse range of images generated by state-of-the-art diffusion models reveal substantial performance enhancements compared to existing approaches. For instance, our method demonstrates a notable increase in AUROC, from 0.48 to 0.82, when compared to state-of-the-art approaches in detecting watermarked images under adversarial attacks, while maintaining image quality, as indicated by closely aligned FID and CLIP scores.
Paper Structure (17 sections, 2 theorems, 11 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 2 theorems, 11 equations, 3 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

Let $h: \mathbb{R} \rightarrow[0,1]$ be a continuous function. Let $\sigma>0$, and $H(x) = \underset{Z \sim \mathcal{N}\left(0, \sigma^{2} I\right)}{\mathbb{E}}[h(X+Z)]$. Then the function $\Phi^{-1}(H(X))$ is $\sigma^{-1}$-Lipschitz.

Figures (3)

  • Figure 1: Illustration of our proposed RAW (top row) and popular encoder-decoder based watermarking schemes (bottom row).
  • Figure 2: Effects of (a) jointly training watermarks and models and (b) using spatial watermarks on training loss and test accuracy.
  • Figure 3: Examples of RAW-watermarked images (bottom row).

Theorems & Definitions (7)

  • Remark 1: Watermarks can be generated by Alice and/or Bob.
  • Definition 1: Watermarking Module
  • Definition 2: Verification Module
  • Definition 3: Modification Module
  • Lemma 1: salman2019provably
  • Remark 2: $\mathcal{A}$ can not be excessively adversarial
  • Theorem 1: Certified FPR of $g$ based on threshold in Equation (\ref{['quantile_selection']})