Table of Contents
Fetching ...

Adversarial Backdoor Defense in CLIP

Junhao Kuang, Siyuan Liang, Jiawei Liang, Kuanrong Liu, Xiaochun Cao

TL;DR

This work addresses backdoor vulnerabilities in multimodal CLIP models by introducing Adversarial Backdoor Defense (ABD), a data-augmentation strategy that uses adversarial examples crafted to resemble backdoor features. ABD trains with a backdoor-oriented loss based on InfoNCE, $\mathcal{L}_{\text{bd}}$, to align clean and backdoor representations, while augmenting images with adversarial perturbations and text with Easy Data Augmentation (EDA). Experimental results on ImageNet-1K show ABD substantially reduces backdoor attack success rates across BadNet, Blended, and BadCLIP with only minor reductions in clean accuracy, outperforming RoCLIP and CleanCLIP. The ablation studies confirm the central role of adversarial-image augmentation and demonstrate the beneficial effect of incorporating the backdoor loss, offering practical robustness gains for CLIP in backdoored settings. Overall, the work highlights a meaningful link between adversarial and backdoor samples and provides a scalable defense mechanism with clear implications for secure multimodal learning.

Abstract

Multimodal contrastive pretraining, exemplified by models like CLIP, has been found to be vulnerable to backdoor attacks. While current backdoor defense methods primarily employ conventional data augmentation to create augmented samples aimed at feature alignment, these methods fail to capture the distinct features of backdoor samples, resulting in suboptimal defense performance. Observations reveal that adversarial examples and backdoor samples exhibit similarities in the feature space within the compromised models. Building on this insight, we propose Adversarial Backdoor Defense (ABD), a novel data augmentation strategy that aligns features with meticulously crafted adversarial examples. This approach effectively disrupts the backdoor association. Our experiments demonstrate that ABD provides robust defense against both traditional uni-modal and multimodal backdoor attacks targeting CLIP. Compared to the current state-of-the-art defense method, CleanCLIP, ABD reduces the attack success rate by 8.66% for BadNet, 10.52% for Blended, and 53.64% for BadCLIP, while maintaining a minimal average decrease of just 1.73% in clean accuracy.

Adversarial Backdoor Defense in CLIP

TL;DR

This work addresses backdoor vulnerabilities in multimodal CLIP models by introducing Adversarial Backdoor Defense (ABD), a data-augmentation strategy that uses adversarial examples crafted to resemble backdoor features. ABD trains with a backdoor-oriented loss based on InfoNCE, , to align clean and backdoor representations, while augmenting images with adversarial perturbations and text with Easy Data Augmentation (EDA). Experimental results on ImageNet-1K show ABD substantially reduces backdoor attack success rates across BadNet, Blended, and BadCLIP with only minor reductions in clean accuracy, outperforming RoCLIP and CleanCLIP. The ablation studies confirm the central role of adversarial-image augmentation and demonstrate the beneficial effect of incorporating the backdoor loss, offering practical robustness gains for CLIP in backdoored settings. Overall, the work highlights a meaningful link between adversarial and backdoor samples and provides a scalable defense mechanism with clear implications for secure multimodal learning.

Abstract

Multimodal contrastive pretraining, exemplified by models like CLIP, has been found to be vulnerable to backdoor attacks. While current backdoor defense methods primarily employ conventional data augmentation to create augmented samples aimed at feature alignment, these methods fail to capture the distinct features of backdoor samples, resulting in suboptimal defense performance. Observations reveal that adversarial examples and backdoor samples exhibit similarities in the feature space within the compromised models. Building on this insight, we propose Adversarial Backdoor Defense (ABD), a novel data augmentation strategy that aligns features with meticulously crafted adversarial examples. This approach effectively disrupts the backdoor association. Our experiments demonstrate that ABD provides robust defense against both traditional uni-modal and multimodal backdoor attacks targeting CLIP. Compared to the current state-of-the-art defense method, CleanCLIP, ABD reduces the attack success rate by 8.66% for BadNet, 10.52% for Blended, and 53.64% for BadCLIP, while maintaining a minimal average decrease of just 1.73% in clean accuracy.
Paper Structure (10 sections, 2 equations, 2 figures, 3 tables)

This paper contains 10 sections, 2 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The main pipeline of our Adversarial-based Backdoor Defense against backdoor attacks in CLIP. Our pipeline consists of three key stages. In the poisoning stage, we introduce crafted backdoor patterns into images and pair these images with captions containing the target label for data fine-tuning and poisoning. In the defense stage, we train adversarial examples closely related to backdoor features in the compromised model. In the inference stage, we validate poisoning and defense effectiveness through experiments conducted on the ImageNet 1K validation dataset.
  • Figure 2: Explanation of the effect of adversarial examples in backdoor defense.