Table of Contents
Fetching ...

Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal

Haonan An, Guang Hua, Zhengru Fang, Guowen Xu, Susanto Rahardja, Yuguang Fang

TL;DR

This work identifies a vulnerability in box-free watermarking where the watermark decoder can be exploited by a gradient-based attacker to train a remover that eliminates the watermark. It introduces Decoder Gradient Shield (DGS), a closed-form defense that reorients and scales the gradient of watermarked queries through a positive definite matrix $P$, yielding the relation $Z^* = -P Z + (P+I)W$ and enabling a protected API that preserves decoder function while hindering learning of removal. The authors provide a detailed threat model, derive the gradient-based attack, and demonstrate through deraining and style transfer experiments that DGS prevents watermark removal without sacrificing output fidelity, showing robustness to common post-processing attacks. The findings offer a practical IP protection mechanism for box-free watermarking in image-to-image models and identify avenues for future work on countering reverse engineering of the defense.

Abstract

The intellectual property of deep image-to-image models can be protected by the so-called box-free watermarking. It uses an encoder and a decoder, respectively, to embed into and extract from the model's output images invisible copyright marks. Prior works have improved watermark robustness, focusing on the design of better watermark encoders. In this paper, we reveal an overlooked vulnerability of the unprotected watermark decoder which is jointly trained with the encoder and can be exploited to train a watermark removal network. To defend against such an attack, we propose the decoder gradient shield (DGS) as a protection layer in the decoder API to prevent gradient-based watermark removal with a closed-form solution. The fundamental idea is inspired by the classical adversarial attack, but is utilized for the first time as a defensive mechanism in the box-free model watermarking. We then demonstrate that DGS can reorient and rescale the gradient directions of watermarked queries and stop the watermark remover's training loss from converging to the level without DGS, while retaining decoder output image quality. Experimental results verify the effectiveness of proposed method. Code of paper will be made available upon acceptance.

Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal

TL;DR

This work identifies a vulnerability in box-free watermarking where the watermark decoder can be exploited by a gradient-based attacker to train a remover that eliminates the watermark. It introduces Decoder Gradient Shield (DGS), a closed-form defense that reorients and scales the gradient of watermarked queries through a positive definite matrix , yielding the relation and enabling a protected API that preserves decoder function while hindering learning of removal. The authors provide a detailed threat model, derive the gradient-based attack, and demonstrate through deraining and style transfer experiments that DGS prevents watermark removal without sacrificing output fidelity, showing robustness to common post-processing attacks. The findings offer a practical IP protection mechanism for box-free watermarking in image-to-image models and identify avenues for future work on countering reverse engineering of the defense.

Abstract

The intellectual property of deep image-to-image models can be protected by the so-called box-free watermarking. It uses an encoder and a decoder, respectively, to embed into and extract from the model's output images invisible copyright marks. Prior works have improved watermark robustness, focusing on the design of better watermark encoders. In this paper, we reveal an overlooked vulnerability of the unprotected watermark decoder which is jointly trained with the encoder and can be exploited to train a watermark removal network. To defend against such an attack, we propose the decoder gradient shield (DGS) as a protection layer in the decoder API to prevent gradient-based watermark removal with a closed-form solution. The fundamental idea is inspired by the classical adversarial attack, but is utilized for the first time as a defensive mechanism in the box-free model watermarking. We then demonstrate that DGS can reorient and rescale the gradient directions of watermarked queries and stop the watermark remover's training loss from converging to the level without DGS, while retaining decoder output image quality. Experimental results verify the effectiveness of proposed method. Code of paper will be made available upon acceptance.

Paper Structure

This paper contains 24 sections, 17 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Flowchart of box-free model watermarking for image-to-image models. The thin black arrows represent the black-box querying flow (processing and watermarking), while the thick colored arrows represent potential watermark extraction, and each colored arrow pair corresponds to a single input-output pair for $\mathbb{D}$.
  • Figure 2: Flowchart of gradient-based removal attack. The gradient backpropagated from $\mathbb{D}$ can be estimated by leveraging black-box adversarial attacks. In our setting, however, we assume the attacker can directly obtain the gradient without estimation.
  • Figure 3: Flowchart of the proposed DGS in the black-box API of $\mathbb{D}$.
  • Figure 4: Demonstration of the convergence behavior of attacker's removal loss functions when training $\mathbb{R}$, under different choices of $P$. The $1$st row is deraining and the $2$nd row is style transfer. The $\ell_1$ loss is $\|Z^{\ast} - W_0\|_1^2$, the $\ell_2$ loss is $\|Z^{\ast} - W_0\|_2^2$, while the consistent loss is incorporated from zhang2020model. The loss corresponding to no defense is $\|Z - W_0\|_2^2$.
  • Figure 5: Demonstration of the convergence behavior of the true loss function $\|Z - W_0\|_2^2$ after deploying the proposed DGS, under different choices of $P$, and deraining is considered as an example.
  • ...and 3 more figures