Table of Contents
Fetching ...

Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor Attacks

Md. Iqbal Hossain, Afia Sajeeda, Neeresh Kumar Perla, Ming Shao

TL;DR

This study introduces an innovative strategy to enhance the robustness of multimodal contrastive learning models against backdoor attacks and develops two algorithms to rectify the poisoned CLIP model to negate backdoor effects.

Abstract

The advent of multimodal deep learning models, such as CLIP, has unlocked new frontiers in a wide range of applications, from image-text understanding to classification tasks. However, these models are not safe for adversarial attacks, particularly backdoor attacks, which can subtly manipulate model behavior. Moreover, existing defense methods typically involve training from scratch or fine-tuning using a large dataset without pinpointing the specific labels that are affected. In this study, we introduce an innovative strategy to enhance the robustness of multimodal contrastive learning models against such attacks. In particular, given a poisoned CLIP model, our approach can identify the backdoor trigger and pinpoint the victim samples and labels in an efficient manner. To that end, an image segmentation ``oracle'' is introduced as the supervisor for the output of the poisoned CLIP. We develop two algorithms to rectify the poisoned model: (1) differentiating between CLIP and Oracle's knowledge to identify potential triggers; (2) pinpointing affected labels and victim samples, and curating a compact fine-tuning dataset. With this knowledge, we are allowed to rectify the poisoned CLIP model to negate backdoor effects. Extensive experiments on visual recognition benchmarks demonstrate our strategy is effective in CLIP-based backdoor defense.

Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor Attacks

TL;DR

This study introduces an innovative strategy to enhance the robustness of multimodal contrastive learning models against backdoor attacks and develops two algorithms to rectify the poisoned CLIP model to negate backdoor effects.

Abstract

The advent of multimodal deep learning models, such as CLIP, has unlocked new frontiers in a wide range of applications, from image-text understanding to classification tasks. However, these models are not safe for adversarial attacks, particularly backdoor attacks, which can subtly manipulate model behavior. Moreover, existing defense methods typically involve training from scratch or fine-tuning using a large dataset without pinpointing the specific labels that are affected. In this study, we introduce an innovative strategy to enhance the robustness of multimodal contrastive learning models against such attacks. In particular, given a poisoned CLIP model, our approach can identify the backdoor trigger and pinpoint the victim samples and labels in an efficient manner. To that end, an image segmentation ``oracle'' is introduced as the supervisor for the output of the poisoned CLIP. We develop two algorithms to rectify the poisoned model: (1) differentiating between CLIP and Oracle's knowledge to identify potential triggers; (2) pinpointing affected labels and victim samples, and curating a compact fine-tuning dataset. With this knowledge, we are allowed to rectify the poisoned CLIP model to negate backdoor effects. Extensive experiments on visual recognition benchmarks demonstrate our strategy is effective in CLIP-based backdoor defense.

Paper Structure

This paper contains 30 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Overview of the proposed framework EftCLIP employed for identifying backdoor triggers and ascertaining the labels impacted by backdoor samples, followed by fine-tuning on the curated clean dataset.
  • Figure 2: An illustrative representation of Oracle-guided Trigger Detection and Affected Labels Identification algorithms.
  • Figure 3: (a) Detection rate by varied patch sizes and positions. This graph shows the effect of patch size $\rho$ (8 and 16) and position $\mu$ (random, top left, bottom right) on trigger detection rates on CC3M and Flickr30K datasets at a constant confidence score $\theta$ of 0.66. (b) Detection rate by varied confidence scores $\theta$.
  • Figure 4: (a) Detection rate by varied object list length $\tau$ on CC3M and Flickr30K datasets. (b) Detection rate by varied patch position $\mu$ and object list length $\tau$ on CC3M dataset.
  • Figure 5: Comparison between object detection outputs from Fast Segment Anything (a) and Segment Anything (b) using the prompts 'Human,' 'Running,' 'Building,' and 'Traffic Light.' Fast Segment Anything (a) displays multiple red bounding boxes with overlapping and larger areas. In contrast, Segment Anything (b) generates more precise yellow bounding boxes around the intended objects, showing better adherence to the provided prompt.
  • ...and 2 more figures