When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

Zhihao Li; Gezheng Xu; Jiale Cai; Ruiyi Fang; Di Wu; Qicheng Lao; Charles Ling; Boyu Wang

When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

Zhihao Li, Gezheng Xu, Jiale Cai, Ruiyi Fang, Di Wu, Qicheng Lao, Charles Ling, Boyu Wang

TL;DR

This paper proposes BAIT (Binding Artificial perturbations to Incorrect Targets), a novel bi-level optimization formulation that effectively mitigates the influence of pretraining priors and maintains data unlearnability.

Abstract

Unlearnable Examples (UEs) serve as a data protection strategy that generates imperceptible perturbations to mislead models into learning spurious correlations instead of underlying semantics. In this paper, we uncover a fundamental vulnerability of UEs that emerges when learning starts from a pretrained model. Crucially, our empirical analysis shows that even when data are protected by carefully crafted perturbations, pretraining priors still furnish rich semantic representations that allow the model to circumvent the shortcuts introduced by UEs and capture genuine features, thereby nullifying unlearnability. To address this, we propose BAIT (Binding Artificial perturbations to Incorrect Targets), a novel bi-level optimization formulation. Specifically, the inner level aims at associating the perturbed samples with real labels to simulate standard data-label alignment, while the outer level actively disrupts this alignment by enforcing a mislabel-perturbation binding that maps samples to designated incorrect targets. This mechanism effectively overrides the semantic guidance of priors, forcing the model to rely on the injected perturbations and consequently preventing the acquisition of true semantics. Extensive experiments on standard benchmarks and multiple pretrained backbones demonstrate that BAIT effectively mitigates the influence of pretraining priors and maintains data unlearnability.

When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

TL;DR

Abstract

Paper Structure (24 sections, 7 equations, 8 figures, 13 tables)

This paper contains 24 sections, 7 equations, 8 figures, 13 tables.

Introduction
Related Work
Proposed Method
Problem Setup
Binding Artificial Perturbations to Incorrect Targets
Optimization Strategy for Crafting Effective Unlearnable Examples
Experiments
Experimental Setup
Unlearnability under Pretraining Priors
Ablations and Further Analyses
Qualitative Analysis on the Optimization Process of BAIT
Conclusion
The Use of Large Language Models (LLMs).
Discussion on Parameter Updates and Semantic Learning
Details of the Parameter Updates Illustration
...and 9 more sections

Figures (8)

Figure 1: Empirical analysis of the vulnerability of UEs to pretraining priors. All experiments are conducted on CIFAR-10 with ResNet-18. (a) Existing UE methods suffer severe unlearnability degradation when applied to pretrained (PT) backbones instead of train-from-scratch (TS) models. (b) We progressively replace layers of a four-layer ImageNet pretrained ResNet-18 with randomly initialized layers until obtaining a train-from-scratch model. The resistance to UEs steadily diminishes as pretraining priors are removed. (c) We report the normalized model parameter updates when training on clean data and UEs (details in Appendix \ref{['max normalization']}). We observe that effective perturbations result in minimal parameter updates. In contrast, pretrained models bypass the spurious correlations induced by EMN EMN and remain fully optimized. (d) We plot the learning curve starting from a pretrained backbone. The concurrent rise in the training and test accuracy of EMN indicates that such substantial parameter updates drive the model to acquire real semantics rather than the injected shortcuts, thereby nullifying the unlearnability.
Figure 2: Evaluation of unlearnability transferability (test accuracy (%) $\downarrow$) across architectures, where UEs are generated using a ResNet-18 surrogate and evaluated against diverse pretrained backbones (including CNNs and ViTs).
Figure 3: Perturbed examples on CIFAR-10.
Figure 4: t-SNE visualization of the last layer features. Classifiers are trained on the perturbed training set and tested on the clean test set. The top row displays models trained in a train-from-scratch manner, whereas the bottom row shows models utilizing pretraining priors.
Figure 5: Training accuracy curve on CIFAR-10, illustrating that BAIT successfully misleads the ImageNet-pretrained surrogate model during perturbation optimization.
...and 3 more figures

When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

TL;DR

Abstract

When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

Authors

TL;DR

Abstract

Table of Contents

Figures (8)