Unlearnable Examples Give a False Sense of Data Privacy: Understanding and Relearning
Pucheng Dang, Xing Hu, Kaidi Xu, Jinhao Duan, Di Huang, Husheng Han, Rui Zhang, Zidong Du
TL;DR
Unlearnable examples create a false sense of data privacy by injecting imperceptible perturbations that hijack model training toward perturbation features. The authors reveal a transient learning window where semantic features can be learned, but models quickly get trapped in recognizing perturbation features, especially through shallow layers. They propose Activation Cluster Measurement (ACM) to quantify this trap and Progressive Staged Training (ST), a layer-wise learning-rate schedule that slows shallow layers when ACM signals perturbation learning, preventing entrapment. Across CIFAR-10/100 and ImageNet-mini, ST defeats all state-of-the-art unlearnable methods, preserving clean accuracy and providing a strong baseline for evaluating unlearnable techniques. This work implies that current unlearnable perturbations do not provide real data privacy and offers a practical framework for evaluating and counteracting such attacks.
Abstract
Unlearnable examples are proposed to prevent third parties from exploiting unauthorized data, which generates unlearnable examples by adding imperceptible perturbations to public publishing data. These unlearnable examples proficiently misdirect the model training process, leading it to focus on learning perturbation features while neglecting the semantic features of the image. In this paper, we make an in-depth analysis and observe that models can learn both image features and perturbation features of unlearnable examples at an early training stage, but are rapidly trapped in perturbation features learning since the shallow layers tend to learn on perturbation features and propagate harmful activations to deeper layers. Based on the observations, we propose Progressive Staged Training, a self-adaptive training framework specially designed to break unlearnable examples. The proposed framework effectively prevents models from becoming trapped in learning perturbation features. We evaluated our method on multiple model architectures over diverse datasets, e.g., CIFAR-10, CIFAR-100, and ImageNet-mini. Our method circumvents the unlearnability of all state-of-the-art methods in the literature, revealing that existing unlearnable examples give a false sense of privacy protection and provide a reliable baseline for further evaluation of unlearnable techniques.
