Table of Contents
Fetching ...

Efficient Availability Attacks against Supervised and Contrastive Learning Simultaneously

Yihan Wang, Yifan Zhu, Xiao-Shan Gao

TL;DR

This work tackles data protection by designing availability attacks that degrade performance under both supervised and contrastive learning. It shows that CL-focused poisoning is not sufficient for SL protection, and proposes AUE and AAP, which embed contrastive-like augmentations into supervised poisoning to induce dual unlearnability. The methods achieve state-of-the-art worst-case unlearnability across diverse datasets (including high-resolution ones) with significantly improved efficiency over CL-based attacks, and they generalize across architectures and CL variants. The results indicate practical viability for real-world data protection, enabling scalable defense against data abusers who exploit both supervised and self-supervised learning paradigms.

Abstract

Availability attacks can prevent the unauthorized use of private data and commercial datasets by generating imperceptible noise and making unlearnable examples before release. Ideally, the obtained unlearnability prevents algorithms from training usable models. When supervised learning (SL) algorithms have failed, a malicious data collector possibly resorts to contrastive learning (CL) algorithms to bypass the protection. Through evaluation, we have found that most of the existing methods are unable to achieve both supervised and contrastive unlearnability, which poses risks to data protection. Different from recent methods based on contrastive error minimization, we employ contrastive-like data augmentations in supervised error minimization or maximization frameworks to obtain attacks effective for both SL and CL. Our proposed AUE and AAP attacks achieve state-of-the-art worst-case unlearnability across SL and CL algorithms with less computation consumption, showcasing prospects in real-world applications.

Efficient Availability Attacks against Supervised and Contrastive Learning Simultaneously

TL;DR

This work tackles data protection by designing availability attacks that degrade performance under both supervised and contrastive learning. It shows that CL-focused poisoning is not sufficient for SL protection, and proposes AUE and AAP, which embed contrastive-like augmentations into supervised poisoning to induce dual unlearnability. The methods achieve state-of-the-art worst-case unlearnability across diverse datasets (including high-resolution ones) with significantly improved efficiency over CL-based attacks, and they generalize across architectures and CL variants. The results indicate practical viability for real-world data protection, enabling scalable defense against data abusers who exploit both supervised and self-supervised learning paradigms.

Abstract

Availability attacks can prevent the unauthorized use of private data and commercial datasets by generating imperceptible noise and making unlearnable examples before release. Ideally, the obtained unlearnability prevents algorithms from training usable models. When supervised learning (SL) algorithms have failed, a malicious data collector possibly resorts to contrastive learning (CL) algorithms to bypass the protection. Through evaluation, we have found that most of the existing methods are unable to achieve both supervised and contrastive unlearnability, which poses risks to data protection. Different from recent methods based on contrastive error minimization, we employ contrastive-like data augmentations in supervised error minimization or maximization frameworks to obtain attacks effective for both SL and CL. Our proposed AUE and AAP attacks achieve state-of-the-art worst-case unlearnability across SL and CL algorithms with less computation consumption, showcasing prospects in real-world applications.
Paper Structure (40 sections, 5 theorems, 19 equations, 8 figures, 14 tables)

This paper contains 40 sections, 5 theorems, 19 equations, 8 figures, 14 tables.

Key Result

Proposition 4.1

Let ${\mathcal{E}}_{\hbox{\rm\tiny SL}} =\mathop{\mathbb{E}}_{{\mathcal{D}}, \mu} [{\mathcal{L}}_{\hbox{\rm\tiny SL}}(\boldsymbol{x},y,\pi)]$. With probability at least $1-4\sqrt{{\mathcal{E}}_{\hbox{\rm\tiny SL}}}$, it holds

Figures (8)

  • Figure 1: Illustration of our methods. The left-bottom flow (blue) is our supervised learning-based poisoning generation. The left-top flow is contrastive learning-based poisoning generation. The right flows are multiple supervised and contrastive learning evaluations on the poisoned data.
  • Figure 2: InfoNCE loss decreases with CE loss.
  • Figure 3: (a) Contrastive losses during SimCLR training under UE and our AUE attacks. (b) Alignment and uniformity gaps during the SimCLR training on CIFAR-10 poisoned by our AUE attack.
  • Figure 4: Visualization of poisoning on CIFAR-10. Left: Perturbation images. Right: T-SNE of perturbations.
  • Figure 5: Training process on poisoned CIFAR-10. Left: Supervised learning. Right: SimCLR.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Proposition 4.1
  • Remark 4.2
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • proof
  • Lemma 4.5
  • proof
  • Lemma 4.6
  • proof
  • ...and 1 more