Certifiably Robust Image Watermark

Zhengyuan Jiang; Moyang Guo; Yuepeng Hu; Jinyuan Jia; Neil Zhenqiang Gong

Certifiably Robust Image Watermark

Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Jinyuan Jia, Neil Zhenqiang Gong

TL;DR

This work proposes the first image watermarks with certified robustness guarantees against removal and forgery attacks, and leverages randomized smoothing, a popular technique to build certifiably robust classifiers and regression models.

Abstract

Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns. Watermarking AI-generated content is a key technology to address these concerns and has been widely deployed in industry. However, watermarking is vulnerable to removal attacks and forgery attacks. In this work, we propose the first image watermarks with certified robustness guarantees against removal and forgery attacks. Our method leverages randomized smoothing, a popular technique to build certifiably robust classifiers and regression models. Our major technical contributions include extending randomized smoothing to watermarking by considering its unique characteristics, deriving the certified robustness guarantees, and designing algorithms to estimate them. Moreover, we extensively evaluate our image watermarks in terms of both certified and empirical robustness. Our code is available at \url{https://github.com/zhengyuan-jiang/Watermark-Library}.

Certifiably Robust Image Watermark

TL;DR

Abstract

Paper Structure (23 sections, 3 theorems, 25 equations, 11 figures, 2 algorithms)

This paper contains 23 sections, 3 theorems, 25 equations, 11 figures, 2 algorithms.

Introduction
Related Works
Watermarking
Watermark Removal and Forgery Attacks
Randomized Smoothing
Problem Formulation
Our Smoothing Framework
Overview
Building a Smoothed Decoder $D_s$
Deriving Certified Robustness
Estimating $BA(D_s(x),w_t)$, $\underline{BA}(x)$, and $\overline{BA}(x)$
Improving Certified Robustness via Adversarial Training
Evaluation
Experimental Setup
Certified Robustness
...and 8 more sections

Key Result

theorem thmcountertheorem

Our watermarking method $(w_t,E,D_s)$ obtained by multi-class smoothing is certifiably robust for any image $x$. Specifically, when the perturbation $\delta$ added to $x$ is bounded by $R$, we can derive the following lower bound $\underline{BA}(x)$ and upper bound $\overline{BA}(x)$ for $BA(D_s(x+\ where $\mathbb{I}$ is the indicator function, $r_i(x) = \sigma \Phi^{-1}(\underline{p_{l_i}})$, $\P

Figures (11)

Figure 1: Illustration of our smoothing framework with three variants.
Figure 2: (a) CFNR and (b) CFPR of our three smoothing based watermarking methods. (c) CFNR and (d) CFPR of our regression smoothing based watermarking when the base watermarking method is trained via standard or adversarial training.
Figure 3: (a-b) Impact of detection threshold $\tau$. (c-d) Impact of smoothing Gaussian noise standard derivation $\sigma$.
Figure 4: Results of base vs. smoothed watermarking under the 4 removal attacks.
Figure 5: Comparing our three smoothing based watermarking methods on Midjourney and DALL-E datasets.
...and 6 more figures

Theorems & Definitions (4)

definition thmcounterdefinition: Certifiably Robust Watermark
theorem thmcountertheorem: Certified Robustness of Multi-class Smoothing based Watermarking
theorem thmcountertheorem: Certified Robustness of Multi-label Smoothing based Watermarking
theorem thmcountertheorem: Certified Robustness of Regression Smoothing based Watermarking

Certifiably Robust Image Watermark

TL;DR

Abstract

Certifiably Robust Image Watermark

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (4)