Table of Contents
Fetching ...

Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning

Siyuan Liang, Kuanrong Liu, Jiajun Gong, Jiawei Liang, Yuan Xun, Ee-Chien Chang, Xiaochun Cao

TL;DR

The paper tackles backdoor threats in multimodal contrastive learning by introducing Unlearning Backdoor Threats (UBT), a low-cost defense that leverages a small set of poisoned samples to unlearn backdoor associations. The approach combines an overfitting-driven suspicious-sample identification phase with a token-level local unlearning regime to remove backdoor features while preserving clean performance. Key contributions include a new defense paradigm for MCL, a targeted overfitting and detection pipeline to pinpoint vulnerable samples, and a token-level forgetting mechanism that reduces backdoor influence with minimal impact on normal behavior. Empirical results demonstrate that UBT can drastically reduce attack success rates and maintain high purification accuracy, offering a practical defense option for secure multimodal systems.

Abstract

Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features using the complementary strengths of various data modalities. However, the open nature of such systems inadvertently increases the possibility of backdoor attacks. These attacks subtly embed malicious behaviors within the model during training, which can be activated by specific triggers in the inference phase, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the adverse impacts of such attacks, these defenses often degrade the clean accuracy and necessitate the construction of extensive clean training pairs. In this paper, we explore the possibility of a less-cost defense from the perspective of model unlearning, that is, whether the model can be made to quickly \textbf{u}nlearn \textbf{b}ackdoor \textbf{t}hreats (UBT) by constructing a small set of poisoned samples. Specifically, we strengthen the backdoor shortcuts to discover suspicious samples through overfitting training prioritized by weak similarity samples. Building on the initial identification of suspicious samples, we introduce an innovative token-based localized forgetting training regime. This technique specifically targets the poisoned aspects of the model, applying a focused effort to unlearn the backdoor associations and trying not to damage the integrity of the overall model. Experimental results show that our method not only ensures a minimal success rate for attacks, but also preserves the model's high clean accuracy.

Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning

TL;DR

The paper tackles backdoor threats in multimodal contrastive learning by introducing Unlearning Backdoor Threats (UBT), a low-cost defense that leverages a small set of poisoned samples to unlearn backdoor associations. The approach combines an overfitting-driven suspicious-sample identification phase with a token-level local unlearning regime to remove backdoor features while preserving clean performance. Key contributions include a new defense paradigm for MCL, a targeted overfitting and detection pipeline to pinpoint vulnerable samples, and a token-level forgetting mechanism that reduces backdoor influence with minimal impact on normal behavior. Empirical results demonstrate that UBT can drastically reduce attack success rates and maintain high purification accuracy, offering a practical defense option for secure multimodal systems.

Abstract

Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features using the complementary strengths of various data modalities. However, the open nature of such systems inadvertently increases the possibility of backdoor attacks. These attacks subtly embed malicious behaviors within the model during training, which can be activated by specific triggers in the inference phase, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the adverse impacts of such attacks, these defenses often degrade the clean accuracy and necessitate the construction of extensive clean training pairs. In this paper, we explore the possibility of a less-cost defense from the perspective of model unlearning, that is, whether the model can be made to quickly \textbf{u}nlearn \textbf{b}ackdoor \textbf{t}hreats (UBT) by constructing a small set of poisoned samples. Specifically, we strengthen the backdoor shortcuts to discover suspicious samples through overfitting training prioritized by weak similarity samples. Building on the initial identification of suspicious samples, we introduce an innovative token-based localized forgetting training regime. This technique specifically targets the poisoned aspects of the model, applying a focused effort to unlearn the backdoor associations and trying not to damage the integrity of the overall model. Experimental results show that our method not only ensures a minimal success rate for attacks, but also preserves the model's high clean accuracy.
Paper Structure (11 sections, 3 equations, 2 figures, 1 table)

This paper contains 11 sections, 3 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: The overall framework of UBT backdoor defense method.
  • Figure 2: Sample distribution statistics under different defense methods.