Table of Contents
Fetching ...

Unlearnable Examples Give a False Sense of Data Privacy: Understanding and Relearning

Pucheng Dang, Xing Hu, Kaidi Xu, Jinhao Duan, Di Huang, Husheng Han, Rui Zhang, Zidong Du

TL;DR

Unlearnable examples create a false sense of data privacy by injecting imperceptible perturbations that hijack model training toward perturbation features. The authors reveal a transient learning window where semantic features can be learned, but models quickly get trapped in recognizing perturbation features, especially through shallow layers. They propose Activation Cluster Measurement (ACM) to quantify this trap and Progressive Staged Training (ST), a layer-wise learning-rate schedule that slows shallow layers when ACM signals perturbation learning, preventing entrapment. Across CIFAR-10/100 and ImageNet-mini, ST defeats all state-of-the-art unlearnable methods, preserving clean accuracy and providing a strong baseline for evaluating unlearnable techniques. This work implies that current unlearnable perturbations do not provide real data privacy and offers a practical framework for evaluating and counteracting such attacks.

Abstract

Unlearnable examples are proposed to prevent third parties from exploiting unauthorized data, which generates unlearnable examples by adding imperceptible perturbations to public publishing data. These unlearnable examples proficiently misdirect the model training process, leading it to focus on learning perturbation features while neglecting the semantic features of the image. In this paper, we make an in-depth analysis and observe that models can learn both image features and perturbation features of unlearnable examples at an early training stage, but are rapidly trapped in perturbation features learning since the shallow layers tend to learn on perturbation features and propagate harmful activations to deeper layers. Based on the observations, we propose Progressive Staged Training, a self-adaptive training framework specially designed to break unlearnable examples. The proposed framework effectively prevents models from becoming trapped in learning perturbation features. We evaluated our method on multiple model architectures over diverse datasets, e.g., CIFAR-10, CIFAR-100, and ImageNet-mini. Our method circumvents the unlearnability of all state-of-the-art methods in the literature, revealing that existing unlearnable examples give a false sense of privacy protection and provide a reliable baseline for further evaluation of unlearnable techniques.

Unlearnable Examples Give a False Sense of Data Privacy: Understanding and Relearning

TL;DR

Unlearnable examples create a false sense of data privacy by injecting imperceptible perturbations that hijack model training toward perturbation features. The authors reveal a transient learning window where semantic features can be learned, but models quickly get trapped in recognizing perturbation features, especially through shallow layers. They propose Activation Cluster Measurement (ACM) to quantify this trap and Progressive Staged Training (ST), a layer-wise learning-rate schedule that slows shallow layers when ACM signals perturbation learning, preventing entrapment. Across CIFAR-10/100 and ImageNet-mini, ST defeats all state-of-the-art unlearnable methods, preserving clean accuracy and providing a strong baseline for evaluating unlearnable techniques. This work implies that current unlearnable perturbations do not provide real data privacy and offers a practical framework for evaluating and counteracting such attacks.

Abstract

Unlearnable examples are proposed to prevent third parties from exploiting unauthorized data, which generates unlearnable examples by adding imperceptible perturbations to public publishing data. These unlearnable examples proficiently misdirect the model training process, leading it to focus on learning perturbation features while neglecting the semantic features of the image. In this paper, we make an in-depth analysis and observe that models can learn both image features and perturbation features of unlearnable examples at an early training stage, but are rapidly trapped in perturbation features learning since the shallow layers tend to learn on perturbation features and propagate harmful activations to deeper layers. Based on the observations, we propose Progressive Staged Training, a self-adaptive training framework specially designed to break unlearnable examples. The proposed framework effectively prevents models from becoming trapped in learning perturbation features. We evaluated our method on multiple model architectures over diverse datasets, e.g., CIFAR-10, CIFAR-100, and ImageNet-mini. Our method circumvents the unlearnability of all state-of-the-art methods in the literature, revealing that existing unlearnable examples give a false sense of privacy protection and provide a reliable baseline for further evaluation of unlearnable techniques.
Paper Structure (27 sections, 7 equations, 14 figures, 10 tables, 1 algorithm)

This paper contains 27 sections, 7 equations, 14 figures, 10 tables, 1 algorithm.

Figures (14)

  • Figure 1: Test accuracy of a ResNet-18 with normal (NT), adversarial (AT), Chroma-CVE and ST series trained (ours) on various unlearnable protected CIFAR-10 datasets such as REM REM, SP SP OPS OPS and HYPO HYPO. The results reflect that our training method ST defeats the data protection capability of these unlearnable methods and is highly effective in preserving clean accuracy.
  • Figure 2: An overview of ST when countering unlearning perturbations. Compared with normal training, once the indicator ACM (Eq. \ref{['eq:acm']}) detects unlearnable perturbation feature learning, progressive staged training (ST) recovers clean accuracy of unlearnable examples by adjusting the learning rate of each layer (Eq. \ref{['eta']}).
  • Figure 3: The sketch of $\bm{\theta}_u^D$ and $\bm{\theta}_u^S$. Their weights come from a model trained on clean data (middle row) by replacing deep or shallow layers. The result of $\bm{\theta}_u^S$ and $\bm{\theta}_u^D$ trained on unlearnable data can be found in Fig. \ref{['curves']}.
  • Figure 4: Top: the training and test accuracy of a ResNet-18 trained on clean data ($\bm{\theta}_c$) and unlearnable data ($\bm{\theta}_u$). As indicated by the orange arrow, after epoch 3 the training accuracy of $\bm{\theta}_u$ increases sharply, while the test accuracy of $\bm{\theta}_u$ significantly decreases, implying $\bm{\theta}_u$ is trapped in the unlearnable perturbation features. Bottom: the training and test accuracy of $\bm{\theta}_u^S$ (green) and $\bm{\theta}_u^D$ (blue) on unlearnable data. $\bm{\theta}_u^S$ performs better accuracy proving that shallow layers are more crucial for a model to learn correct features during training.
  • Figure 5: The penultimate layer activation t-SNE results and ACM ( Eq.\ref{['eq:acm']}) of various models in different epochs. $\bm{\theta}_c$ is a model naturally trained on clean data. $\bm{\theta}_u$ is a model naturally trained on unlearnable data. $\bm{\theta}_s$ is a model ST trained on unlearnable data. The t-SNE results and ACM of $\bm{\theta}_s$ are similar to $\bm{\theta}_c$ and different from $\bm{\theta}_u$, which reflects that our method ST truly prevents models from learning unlearnable perturbation features.
  • ...and 9 more figures