CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

Jinghuai Zhang; Hongbin Liu; Jinyuan Jia; Neil Zhenqiang Gong

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

Jinghuai Zhang, Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong

TL;DR

This work introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness and proposes new DPBAs called CorruptEncoder to CL, which substantially outperforms existing DPBAs.

Abstract

Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images or image-text pairs. CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we take the first step to analyze the limitations of existing backdoor attacks and propose new DPBAs called CorruptEncoder to CL. CorruptEncoder introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs. In particular, CorruptEncoder is the first DPBA that achieves more than 90% attack success rates with only a few (3) reference images and a small poisoning ratio 0.5%. Moreover, we also propose a defense, called localized cropping, to defend against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

TL;DR

Abstract

Paper Structure (27 sections, 2 theorems, 19 equations, 14 figures, 13 tables, 2 algorithms)

This paper contains 27 sections, 2 theorems, 19 equations, 14 figures, 13 tables, 2 algorithms.

Introduction
Threat Model
CorruptEncoder
Crafting Poisoned Images
Theoretical Analysis
CorruptEncoder+
Experiments
Experimental Setup
Experimental Results
Defense
Extension to Multi-modal CL
Related Work
Conclusion
Proof of Theorem 1
Proof of Theorem 2
...and 12 more sections

Key Result

Theorem 1

Suppose left-right layout or bottom-top layout is used. $(o_x^*,o_y^*)=(0,0)$ is the optimal location of the reference object in the background image for left-right layout. $(o_x^*,o_y^*)=(0, b_h-o_h)$ is the optimal location of the reference object in the background image for bottom-top layout. The

Figures (14)

Figure 1: Reference image (left) vs. reference object (right).
Figure 2: Illustration of the optimal size ($b_w^*$, $b_h^*$) of the background image and optimal locations ($(o_x^*, o_y^*)$ and $(e_x^*,e_y^*)$) of the reference object and trigger in the background image when crafting a poisoned image.
Figure 3: The probability $p$ as a function of $b_w/o_w$ for left-right layout and $b_h/o_h$ for bottom-top layout. The curves are consistent with our empirical results of ASRs in Figure \ref{['ablation3-1']}(a).
Figure 4: CorruptEncoder+ uses support poisoned images to pull reference objects and other images in the target class close in the feature space so that the reference object can be correctly classified by a downstream classifier.
Figure 5: Impact of pre-training settings on CorruptEncoder.
...and 9 more figures

Theorems & Definitions (4)

Theorem 1: Locations of Reference Object and Trigger
proof
Theorem 2: Size of Background Image
proof

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

TL;DR

Abstract

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (4)