Table of Contents
Fetching ...

Unveiling the Achilles' Heel: Backdoor Watermarking Forgery Attack in Public Dataset Protection

Zhiying Li, Zhi Liu, Dongjie Liu, Shengda Zhuo, Guanggang Geng, Jian Weng, Shanxiang Lyu, Xiaobo Jin

TL;DR

A Forgery Watermark Generator (FW-Gen) is designed to generate forged watermarks and defined a distillation loss between the original watermark and the forged watermark to transfer the information in the original watermark to the forged watermark.

Abstract

High-quality datasets can greatly promote the development of technology. However, dataset construction is expensive and time-consuming, and public datasets are easily exploited by opportunists who are greedy for quick gains, which seriously infringes the rights and interests of dataset owners. At present, backdoor watermarks redefine dataset protection as proof of ownership and become a popular method to protect the copyright of public datasets, which effectively safeguards the rights of owners and promotes the development of open source communities. In this paper, we question the reliability of backdoor watermarks and re-examine them from the perspective of attackers. On the one hand, we refine the process of backdoor watermarks by introducing a third-party judicial agency to enhance its practical applicability in real-world scenarios. On the other hand, by exploring the problem of forgery attacks, we reveal the inherent flaws of the dataset ownership verification process. Specifically, we design a Forgery Watermark Generator (FW-Gen) to generate forged watermarks and define a distillation loss between the original watermark and the forged watermark to transfer the information in the original watermark to the forged watermark. Extensive experiments show that forged watermarks have the same statistical significance as original watermarks in copyright verification tests under various conditions and scenarios, indicating that dataset ownership verification results are insufficient to determine infringement. These findings highlight the unreliability of backdoor watermarking methods for dataset ownership verification and suggest new directions for enhancing methods for protecting public datasets.

Unveiling the Achilles' Heel: Backdoor Watermarking Forgery Attack in Public Dataset Protection

TL;DR

A Forgery Watermark Generator (FW-Gen) is designed to generate forged watermarks and defined a distillation loss between the original watermark and the forged watermark to transfer the information in the original watermark to the forged watermark.

Abstract

High-quality datasets can greatly promote the development of technology. However, dataset construction is expensive and time-consuming, and public datasets are easily exploited by opportunists who are greedy for quick gains, which seriously infringes the rights and interests of dataset owners. At present, backdoor watermarks redefine dataset protection as proof of ownership and become a popular method to protect the copyright of public datasets, which effectively safeguards the rights of owners and promotes the development of open source communities. In this paper, we question the reliability of backdoor watermarks and re-examine them from the perspective of attackers. On the one hand, we refine the process of backdoor watermarks by introducing a third-party judicial agency to enhance its practical applicability in real-world scenarios. On the other hand, by exploring the problem of forgery attacks, we reveal the inherent flaws of the dataset ownership verification process. Specifically, we design a Forgery Watermark Generator (FW-Gen) to generate forged watermarks and define a distillation loss between the original watermark and the forged watermark to transfer the information in the original watermark to the forged watermark. Extensive experiments show that forged watermarks have the same statistical significance as original watermarks in copyright verification tests under various conditions and scenarios, indicating that dataset ownership verification results are insufficient to determine infringement. These findings highlight the unreliability of backdoor watermarking methods for dataset ownership verification and suggest new directions for enhancing methods for protecting public datasets.

Paper Structure

This paper contains 23 sections, 17 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Overview of backdoor watermarking. Stage 1: The dataset owner releases a watermarked dataset with target labels to the public (protected public dataset). Stage 2: The attacker uses the protected dataset to train a deep learning network without permission and provides a model API for query. Stage 3: The dataset owner queries the model API for responses to the original watermarked images and benign images, and determines copyright infringement through hypothesis testing methods.
  • Figure 2: The main process of the court judgment: Stage 4A: The dataset owner uses the evidence to sue the attacker in court and provides evidence (Evidence O in Figure) to the court, including the suspicious model's response $P_{ow}$ and $P_{bn}$ to the original watermarked and benign images, the target label $\tilde{y}$, the predicted label $y_{ow}$, and the original watermark $t_{ow}$. Stage 4B: During the trial, the evidence provided by the dataset owner shows that the attacker has infringed the copyright, while the attacker cannot provide any evidence (Evidence A in the figure). Stage 4-C: The court announces the final judgment: Since the attacker cannot provide any evidence, but the data owner provides sufficient evidence, the attacker has infringed the copyright.
  • Figure 3: The proposed forgery attack scheme is summarized, including several key steps: acquisition of watermark information, training of forged model, generation of rebuttal evidence and court judgment, where OW represents the original watermark and FW represents the forged watermark; evidence O includes the suspicious model's response to the original watermarked and benign images ($P_{ow}$ and $P_{bn}$), target label $\tilde{y}$, predicted label $y_{ow}$ and original watermark $t_{ow}$; evidence A includes the suspicious model's response to the forged watermarked and benign images ($P_{fw}$ and $P_{bn}$), target label $\tilde{y}$, predicted label $y_{fw}$ and original watermark $t_{fw}$.
  • Figure 4: Examples of benign, original watermarked (OW), and forged watermarked (FW) images using the Badnets and Blended backdoor watermark attack methods, where the crosses and lines represent different styles of the original watermark, respectively.
  • Figure 5: Comparisons of PSNR, SSIM, and MSE metrics using the Badnets method and ResNet-18 models trained on the CIFAR-10 and ImageNet datasets as target models, with Original-Forged, Original-Benign, and Forged-Benign representing the comparative pairs of original watermarked images and forged watermarked images, original watermarked images and benign images, and forged watermarked images and benign images, respectively.
  • ...and 1 more figures