Table of Contents
Fetching ...

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

Jinghuai Zhang, Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong

TL;DR

This work introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness and proposes new DPBAs called CorruptEncoder to CL, which substantially outperforms existing DPBAs.

Abstract

Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images or image-text pairs. CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we take the first step to analyze the limitations of existing backdoor attacks and propose new DPBAs called CorruptEncoder to CL. CorruptEncoder introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs. In particular, CorruptEncoder is the first DPBA that achieves more than 90% attack success rates with only a few (3) reference images and a small poisoning ratio 0.5%. Moreover, we also propose a defense, called localized cropping, to defend against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

TL;DR

This work introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness and proposes new DPBAs called CorruptEncoder to CL, which substantially outperforms existing DPBAs.

Abstract

Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images or image-text pairs. CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we take the first step to analyze the limitations of existing backdoor attacks and propose new DPBAs called CorruptEncoder to CL. CorruptEncoder introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs. In particular, CorruptEncoder is the first DPBA that achieves more than 90% attack success rates with only a few (3) reference images and a small poisoning ratio 0.5%. Moreover, we also propose a defense, called localized cropping, to defend against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.
Paper Structure (27 sections, 2 theorems, 19 equations, 14 figures, 13 tables, 2 algorithms)

This paper contains 27 sections, 2 theorems, 19 equations, 14 figures, 13 tables, 2 algorithms.

Key Result

Theorem 1

Suppose left-right layout or bottom-top layout is used. $(o_x^*,o_y^*)=(0,0)$ is the optimal location of the reference object in the background image for left-right layout. $(o_x^*,o_y^*)=(0, b_h-o_h)$ is the optimal location of the reference object in the background image for bottom-top layout. The

Figures (14)

  • Figure 1: Reference image (left) vs. reference object (right).
  • Figure 2: Illustration of the optimal size ($b_w^*$, $b_h^*$) of the background image and optimal locations ($(o_x^*, o_y^*)$ and $(e_x^*,e_y^*)$) of the reference object and trigger in the background image when crafting a poisoned image.
  • Figure 3: The probability $p$ as a function of $b_w/o_w$ for left-right layout and $b_h/o_h$ for bottom-top layout. The curves are consistent with our empirical results of ASRs in Figure \ref{['ablation3-1']}(a).
  • Figure 4: CorruptEncoder+ uses support poisoned images to pull reference objects and other images in the target class close in the feature space so that the reference object can be correctly classified by a downstream classifier.
  • Figure 5: Impact of pre-training settings on CorruptEncoder.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Theorem 1: Locations of Reference Object and Trigger
  • proof
  • Theorem 2: Size of Background Image
  • proof