Table of Contents
Fetching ...

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

Kuanrong Liu, Siyuan Liang, Jiawei Liang, Pengwen Dai, Xiaochun Cao

TL;DR

This study proposes an efficient defense mechanism against backdoor threats using a concept known as machine unlearning, strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities, known as Unlearn Backdoor Threats (UBT).

Abstract

Multimodal contrastive learning uses various data modalities to create high-quality features, but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor attacks. These attacks insert malicious behaviors during training, which are activated by specific triggers during inference, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the malicious impacts of such attacks, these defenses frequently necessitate extensive training time and degrade clean accuracy. In this study, we propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities, known as Unlearn Backdoor Threats (UBT). We specifically use overfit training to improve backdoor shortcuts and accurately detect suspicious samples in the potential poisoning data set. Then, we select fewer unlearned samples from suspicious samples for rapid forgetting in order to eliminate the backdoor effect and thus improve backdoor defense efficiency. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime. This technique focuses on the model's compromised elements, dissociating backdoor correlations while maintaining the model's overall integrity. Extensive experimental results show that our method effectively defends against various backdoor attack methods in the CLIP model. Compared to SoTA backdoor defense methods, UBT achieves the lowest attack success rate while maintaining a high clean accuracy of the model (attack success rate decreases by 19% compared to SOTA, while clean accuracy increases by 2.57%).

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

TL;DR

This study proposes an efficient defense mechanism against backdoor threats using a concept known as machine unlearning, strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities, known as Unlearn Backdoor Threats (UBT).

Abstract

Multimodal contrastive learning uses various data modalities to create high-quality features, but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor attacks. These attacks insert malicious behaviors during training, which are activated by specific triggers during inference, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the malicious impacts of such attacks, these defenses frequently necessitate extensive training time and degrade clean accuracy. In this study, we propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities, known as Unlearn Backdoor Threats (UBT). We specifically use overfit training to improve backdoor shortcuts and accurately detect suspicious samples in the potential poisoning data set. Then, we select fewer unlearned samples from suspicious samples for rapid forgetting in order to eliminate the backdoor effect and thus improve backdoor defense efficiency. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime. This technique focuses on the model's compromised elements, dissociating backdoor correlations while maintaining the model's overall integrity. Extensive experimental results show that our method effectively defends against various backdoor attack methods in the CLIP model. Compared to SoTA backdoor defense methods, UBT achieves the lowest attack success rate while maintaining a high clean accuracy of the model (attack success rate decreases by 19% compared to SOTA, while clean accuracy increases by 2.57%).
Paper Structure (28 sections, 4 theorems, 20 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 28 sections, 4 theorems, 20 equations, 8 figures, 10 tables, 1 algorithm.

Key Result

Theorem 1

For any $\delta \in (0,1)$, with probability at least $1 - \delta$, the following inequality holds: For $h \in \mathcal{H}$, $L(h)=\mathbb{E}_{z \sim P}[l(z,h)]$ is the generalization risk, or simply risk, and $\hat{L}(D,h)=\frac{1}{|D|}\sum_{i=1}^{|D|}l((x_i,y_i),h)$ is the empirical loss. $KL(\cdot||\cdot)$ denotes the KL divergence.

Figures (8)

  • Figure 1: The depth of color in the figure indicates the model's performance. The more prominent the blue, the stronger the clean accuracy; the more vivid the pink, the greater the influence of the backdoor. Attackers inject backdoor shortcuts (red) into the model by adding carefully crafted backdoor data (red) to the clean data (green). The ABL algorithm outperforms others that fail to identify backdoor data accurately, leading to ineffective unlearning and performance loss in the model (light blue).CleanCLIP attempts to purify the backdoor model with additional clean data (brown), but some backdoor knowledge (pink) may still remain in the model.UBT accurately selects a subset of backdoor samples from the training data and uses token-level unlearning to eliminate the backdoor effect. Compared to past work, our approach better cuts off the backdoor shortcut (red) while maintaining the model's performance on clean samples (blue).
  • Figure 2: The overall framework of UBT backdoor defense method.UBT uses a pre-trained model to separate the suspicious dataset (left), enhances the model's sensitivity to backdoors through overfitting on the suspicious data (middle), and finally, uses the overfitted model to filter out backdoor samples, employing token-level unlearning to mitigate the impact of backdoors.
  • Figure 3: Comparison of the separation between backdoor samples (red) and clean samples (green) by UBT (top) and ABL (bottom) under 4 attack methods. The x-axis represents similarity, ranging from -1 to 1, and the y-axis represents density, indicating the proportion of all backdoor (clean) samples.
  • Figure 4: Ablation studies on overfitting strategies: The left figure shows the results of overfitting using only $D_\text{susp}$, while the right figure shows the results of overfitting using the entire dataset $D$. Other settings of the images are the same as in Figure \ref{['fig:ABL_vs_UBT']}.
  • Figure 5: We use the attribution method from chefer2021_ICCV_Mask to score the importance of each token, where green represents score. The darker the color, the higher the score. We choose a threshold of 0.1 and keep tokens with scores higher than this threshold.
  • ...and 3 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof