Table of Contents
Fetching ...

How to Craft Backdoors with Unlabeled Data Alone?

Yifei Wang, Wenhan Ma, Stefanie Jegelka, Yisen Wang

TL;DR

This work addresses backdoor threats in self-supervised learning when an attacker only has access to unlabeled data, introducing the no-label backdoor (NLB) setting. It proposes two poison-selection strategies—clustering-based pseudolabeling and contrastive selection based on mutual information—to craft effective backdoors without labels, formalized via the Total Contrastive Similarity objective. Experiments on CIFAR-10 and ImageNet-100 across several SSL methods show that NLBs can achieve high backdoor success and degrade downstream performance, often outperforming random poisoning and exhibiting partial resistance to certain defenses. The findings reveal a significant security risk for SSL-based foundation models and provide practical, scalable attack mechanisms, encouraging development of robust defenses.

Abstract

Relying only on unlabeled data, Self-supervised learning (SSL) can learn rich features in an economical and scalable way. As the drive-horse for building foundation models, SSL has received a lot of attention recently with wide applications, which also raises security concerns where backdoor attack is a major type of threat: if the released dataset is maliciously poisoned, backdoored SSL models can behave badly when triggers are injected to test samples. The goal of this work is to investigate this potential risk. We notice that existing backdoors all require a considerable amount of \emph{labeled} data that may not be available for SSL. To circumvent this limitation, we explore a more restrictive setting called no-label backdoors, where we only have access to the unlabeled data alone, where the key challenge is how to select the proper poison set without using label information. We propose two strategies for poison selection: clustering-based selection using pseudolabels, and contrastive selection derived from the mutual information principle. Experiments on CIFAR-10 and ImageNet-100 show that both no-label backdoors are effective on many SSL methods and outperform random poisoning by a large margin. Code will be available at https://github.com/PKU-ML/nlb.

How to Craft Backdoors with Unlabeled Data Alone?

TL;DR

This work addresses backdoor threats in self-supervised learning when an attacker only has access to unlabeled data, introducing the no-label backdoor (NLB) setting. It proposes two poison-selection strategies—clustering-based pseudolabeling and contrastive selection based on mutual information—to craft effective backdoors without labels, formalized via the Total Contrastive Similarity objective. Experiments on CIFAR-10 and ImageNet-100 across several SSL methods show that NLBs can achieve high backdoor success and degrade downstream performance, often outperforming random poisoning and exhibiting partial resistance to certain defenses. The findings reveal a significant security risk for SSL-based foundation models and provide practical, scalable attack mechanisms, encouraging development of robust defenses.

Abstract

Relying only on unlabeled data, Self-supervised learning (SSL) can learn rich features in an economical and scalable way. As the drive-horse for building foundation models, SSL has received a lot of attention recently with wide applications, which also raises security concerns where backdoor attack is a major type of threat: if the released dataset is maliciously poisoned, backdoored SSL models can behave badly when triggers are injected to test samples. The goal of this work is to investigate this potential risk. We notice that existing backdoors all require a considerable amount of \emph{labeled} data that may not be available for SSL. To circumvent this limitation, we explore a more restrictive setting called no-label backdoors, where we only have access to the unlabeled data alone, where the key challenge is how to select the proper poison set without using label information. We propose two strategies for poison selection: clustering-based selection using pseudolabels, and contrastive selection derived from the mutual information principle. Experiments on CIFAR-10 and ImageNet-100 show that both no-label backdoors are effective on many SSL methods and outperform random poisoning by a large margin. Code will be available at https://github.com/PKU-ML/nlb.
Paper Structure (26 sections, 12 equations, 8 figures, 13 tables, 2 algorithms)

This paper contains 26 sections, 12 equations, 8 figures, 13 tables, 2 algorithms.

Figures (8)

  • Figure 1: A practical poisoning scenario when the attacker only has access to unlabeled data alone.
  • Figure 2: Analysis on clustering-based NLB on CIFAR-10. a) Cluster Consistency Rate (CCR) of different clusters (obtained with multiple runs). b) CCR across $10$ random runs. c) Backdoor performance with three no-label backdoors: random selection, clustering with good CCR ($86.93\%$), clustering with bad CCR ($45.48\%$).
  • Figure 3: Analysis of contrastive NLB on CIFAR-10. a) backdoor performance of different variants of contrastive selection; b) & c) t-SNE visualization of random and contrastive selection.
  • Figure 4: No-label backdoor on SimCLR with different poisoning rates.
  • Figure 5: t-SNE visualization on CIFAR-10 test features using a SimCLR encoder attacked by contrastive backdoor. Different colors mark different classes.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Remark 3.1