Investigating Deep Watermark Security: An Adversarial Transferability Perspective

Biqing Qi; Junqi Gao; Yiang Luo; Jianxing Liu; Ligang Wu; Bowen Zhou

Investigating Deep Watermark Security: An Adversarial Transferability Perspective

Biqing Qi, Junqi Gao, Yiang Luo, Jianxing Liu, Ligang Wu, Bowen Zhou

TL;DR

This work investigates the security of deep watermarking for generative content against transferable adversarial attacks. It introduces two transferable attackers, Easy Sample Matching Attack (ESMA) and Bottleneck Enhanced Mixup (BEM-ESMA), to quantify erasure and tampering risks across watermark architectures. The authors develop a theoretical framework around Local Sample Density and High Sample Density Regions (HSDR) and show that perturbations toward HSDR improve targeted transferability, with ESS enabling efficient target selection. Empirical results on ImageNet-scale data demonstrate superior targeted transferability for ESMA and BEM-ESMA compared to baselines, while comprehensive watermark erasure/tampering experiments reveal significant vulnerability across HiDDeN, Stable Signature, and FED architectures and various encoding lengths. Overall, the paper offers a robust evaluation methodology and key insights into the trade-offs between transformation robustness and deep watermark security, with implications for designing more trustworthy watermarking systems.

Abstract

The rise of generative neural networks has triggered an increased demand for intellectual property (IP) protection in generated content. Deep watermarking techniques, recognized for their flexibility in IP protection, have garnered significant attention. However, the surge in adversarial transferable attacks poses unprecedented challenges to the security of deep watermarking techniques-an area currently lacking systematic investigation. This study fills this gap by introducing two effective transferable attackers to assess the vulnerability of deep watermarks against erasure and tampering risks. Specifically, we initially define the concept of local sample density, utilizing it to deduce theorems on the consistency of model outputs. Upon discovering that perturbing samples towards high sample density regions (HSDR) of the target class enhances targeted adversarial transferability, we propose the Easy Sample Selection (ESS) mechanism and the Easy Sample Matching Attack (ESMA) method. Additionally, we propose the Bottleneck Enhanced Mixup (BEM) that integrates information bottleneck theory to reduce the generator's dependence on irrelevant noise. Experiments show a significant enhancement in the success rate of targeted transfer attacks for both ESMA and BEM-ESMA methods. We further conduct a comprehensive evaluation using ESMA and BEM-ESMA as measurements, considering model architecture and watermark encoding length, and achieve some impressive findings.

Investigating Deep Watermark Security: An Adversarial Transferability Perspective

TL;DR

Abstract

Paper Structure (33 sections, 1 theorem, 19 equations, 8 figures, 7 tables, 3 algorithms)

This paper contains 33 sections, 1 theorem, 19 equations, 8 figures, 7 tables, 3 algorithms.

Introduction
Related Works
Tranferable Adversarial Attacks
Model Watermarking Methods
Methodology
Analytical Theory and Experiments About Adversarial Transferability
The Output Consistency in HSDR
Remark.
Remark.
Construction of The Attack Strategy
Construction of ESMA
Pre-trained Embeddings Guided by Latent Features
Training of Multi-target Adversarial Perturbation Generators
Bottleneck-Enhanced Mixup
Experiments
...and 18 more sections

Key Result

Proposition 1

Under Assumption assumption1, given any $(x_i,y_i)\in S$, for any $x\in\mathcal{X}$ that satisfies $\left\|x_i-x\right\|\le r$, the following holds: where $\Delta_{x_i}^{x}\ell_{\boldsymbol w} = \left | \ell\left ( \boldsymbol w,x_i \right )- \ell\left ( \boldsymbol w,x \right ) \right |$.

Figures (8)

Figure 1: A schematic example of our motivation, plotting the probability density (darker the color represents larger the density) and samples for three populations (orange, cyan, and green). The black line indicates the Bayesian discriminant boundary.
Figure 2: (a): Bayesian discriminant region, darker the color indicate higher the confidence probability. (b): Classifier discriminant region, the probability density curves of the two population distributions are plotted, the white part represents the low-density region of ground-truth joint distribution, and we boxed out the small pits in the Bayesian misclassified region. (c): Classifier discriminant region with samples. We boxed an outlier. (d): Output differences between three different classifiers, darker purple indicates greater difference in output between different classifiers
Figure 3: The first figure depicts the difference in output of three models under different local sample densities $\rho_{(y_i,x_i,r)}$ divided into different bins. The second figure shows the local empirical risk $R_{(y_i,x_i,r)}$ of samples under different sum of loss and gradient norms (Loss+Gradnorm). For Loss+Gradnorm, we first normalize both variables separately and then add them up to eliminate magnitude differences. The third figure represents the local empirical risk of local sample densities in different values. The fourth figure displays the local density under different Loss+Gradnorms. The neighborhood radius $r$ is taken as $0.4$.
Figure 4: Training strategy of ESMA.
Figure 5: The two figures on the left showcase the outcomes of direct training for $10$ epochs, whereas the two figures on the right depict the results obtained by optimizing $\mathcal{L}_{\mathcal{M}}$. The numbers $0-9$ correspond to different class labels in sequential order. Each figure displays two images, representing distances after PCA projection onto a $2$-dimensional plane (normalized and unnormalized). The visualizations reflect cosine similarity and Euclidean distance, respectively.
...and 3 more figures

Theorems & Definitions (2)

Definition 1: $(j,x_0,r)$-Local sample density
Proposition 1

Investigating Deep Watermark Security: An Adversarial Transferability Perspective

TL;DR

Abstract

Investigating Deep Watermark Security: An Adversarial Transferability Perspective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)