Table of Contents
Fetching ...

PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark

Cheng Wei, Yang Wang, Kuofeng Gao, Shuo Shao, Yiming Li, Zhibo Wang, Zhan Qin

TL;DR

This work designs a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness, and designs a hypothesis-test-guided dataset ownership verification based on the proposed watermark.

Abstract

Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which are susceptible to the number of categories, our method could watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods.

PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark

TL;DR

This work designs a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness, and designs a hypothesis-test-guided dataset ownership verification based on the proposed watermark.

Abstract

Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which are susceptible to the number of categories, our method could watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods.
Paper Structure (24 sections, 3 theorems, 15 equations, 16 figures, 11 tables, 2 algorithms)

This paper contains 24 sections, 3 theorems, 15 equations, 16 figures, 11 tables, 2 algorithms.

Key Result

Proposition 1

Suppose $f(\bm{x})$ is the posterior probability of $\bm{x}$ predicted by the suspicious model. Let variable $X$ denotes the benign sample from the target class $y^{(t)}$ and variable $X'$ is its verified version ($i.e.$$X'=U(X,\Gamma)$). Let variable $P_b = f(X)_{y^{(t)}}$ and $P_v = f(X')_{y^{(t)}

Figures (16)

  • Figure 1: The limitations of existing backdoor attacks that could be used as watermarks to protect point cloud datasets. (a) Existing poison-label backdoor watermarks ($i.e.$, PCBA xiang2021backdoor, PointPBA li2021pointba, IRBA gao2023imperceptible) are not stealthy under human inspection due to sample-label mismatch. (b) The only existing clean-label backdoor watermark ($i.e.$, PointCBA li2021pointba) has limited effect (measured by watermark success rate (WSR)) when the protected dataset contains many categories.
  • Figure 2: The main pipeline of dataset ownership verification based on our negative clean-label backdoor watermark for point clouds (PointNCBW). In the watermarking stage, we generate the watermarked version of the original dataset. Specifically, we design and exploit transferable feature perturbation (TFP) to perturb a few selected point clouds from the original dataset with non-target categories so that they lie close to those from the target class in the feature space. Our TFP has two steps, including shape-wise and point-wise perturbations, to ensure transferability across model structures. In the verification stage, we verify whether a suspicious third-party model is trained on our protected dataset by examining whether it misclassifies samples from the target class containing owner-specified trigger patterns via the hypothesis test.
  • Figure 3: The example of point clouds involved in different backdoor watermarks.
  • Figure 4: Effects of watermarking rate $\lambda$ and size of verification set $m$ on the performance of PointNCBW-based dataset ownership verification on the ModelNet40 dataset.
  • Figure 5: Effects of regularization hyper-parameter $\eta$ on the magnitude of point-wise perturbation measured by Chamfer distance ($D_{ch}$). In general, the smaller the distance, the more imperceptible the point-wise perturbations.
  • ...and 11 more figures

Theorems & Definitions (4)

  • Proposition 1
  • Theorem 1
  • Theorem 1
  • proof