Table of Contents
Fetching ...

Backdoor Attack on Unpaired Medical Image-Text Foundation Models: A Pilot Study on MedCLIP

Ruinan Jin, Chun-Yin Huang, Chenyu You, Xiaoxiao Li

TL;DR

This paper investigates backdoor vulnerabilities in medical vision-language foundation models trained with unpaired image-text data, focusing on MedCLIP. It introduces BadMatch, a label-mismatch attack during unpaired training, and BadDist, a malicious loss that separates clean and poisoned embeddings to amplify the backdoor, and evaluates them across the FM supply chain using COVIDX, RSNA, and MIMIC datasets. Key findings show that even a small fraction of mislabeled data can trigger strong backdoor effects in MedCLIP, with Fourier-based triggers achieving near-perfect backdoors and BadDist often delivering the best trade-off between effectiveness and utility; existing defenses provide limited protection. The work underscores critical data validation needs in medical FM pipelines and informs defense research by revealing robust backdoor strategies in unpaired, multi-modal training contexts.

Abstract

In recent years, foundation models (FMs) have solidified their role as cornerstone advancements in the deep learning domain. By extracting intricate patterns from vast datasets, these models consistently achieve state-of-the-art results across a spectrum of downstream tasks, all without necessitating extensive computational resources. Notably, MedCLIP, a vision-language contrastive learning-based medical FM, has been designed using unpaired image-text training. While the medical domain has often adopted unpaired training to amplify data, the exploration of potential security concerns linked to this approach hasn't kept pace with its practical usage. Notably, the augmentation capabilities inherent in unpaired training also indicate that minor label discrepancies can result in significant model deviations. In this study, we frame this label discrepancy as a backdoor attack problem. We further analyze its impact on medical FMs throughout the FM supply chain. Our evaluation primarily revolves around MedCLIP, emblematic of medical FM employing the unpaired strategy. We begin with an exploration of vulnerabilities in MedCLIP stemming from unpaired image-text matching, termed BadMatch. BadMatch is achieved using a modest set of wrongly labeled data. Subsequently, we disrupt MedCLIP's contrastive learning through BadDist-assisted BadMatch by introducing a Bad-Distance between the embeddings of clean and poisoned data. Additionally, combined with BadMatch and BadDist, the attacking pipeline consistently fends off backdoor assaults across diverse model designs, datasets, and triggers. Also, our findings reveal that current defense strategies are insufficient in detecting these latent threats in medical FMs' supply chains.

Backdoor Attack on Unpaired Medical Image-Text Foundation Models: A Pilot Study on MedCLIP

TL;DR

This paper investigates backdoor vulnerabilities in medical vision-language foundation models trained with unpaired image-text data, focusing on MedCLIP. It introduces BadMatch, a label-mismatch attack during unpaired training, and BadDist, a malicious loss that separates clean and poisoned embeddings to amplify the backdoor, and evaluates them across the FM supply chain using COVIDX, RSNA, and MIMIC datasets. Key findings show that even a small fraction of mislabeled data can trigger strong backdoor effects in MedCLIP, with Fourier-based triggers achieving near-perfect backdoors and BadDist often delivering the best trade-off between effectiveness and utility; existing defenses provide limited protection. The work underscores critical data validation needs in medical FM pipelines and informs defense research by revealing robust backdoor strategies in unpaired, multi-modal training contexts.

Abstract

In recent years, foundation models (FMs) have solidified their role as cornerstone advancements in the deep learning domain. By extracting intricate patterns from vast datasets, these models consistently achieve state-of-the-art results across a spectrum of downstream tasks, all without necessitating extensive computational resources. Notably, MedCLIP, a vision-language contrastive learning-based medical FM, has been designed using unpaired image-text training. While the medical domain has often adopted unpaired training to amplify data, the exploration of potential security concerns linked to this approach hasn't kept pace with its practical usage. Notably, the augmentation capabilities inherent in unpaired training also indicate that minor label discrepancies can result in significant model deviations. In this study, we frame this label discrepancy as a backdoor attack problem. We further analyze its impact on medical FMs throughout the FM supply chain. Our evaluation primarily revolves around MedCLIP, emblematic of medical FM employing the unpaired strategy. We begin with an exploration of vulnerabilities in MedCLIP stemming from unpaired image-text matching, termed BadMatch. BadMatch is achieved using a modest set of wrongly labeled data. Subsequently, we disrupt MedCLIP's contrastive learning through BadDist-assisted BadMatch by introducing a Bad-Distance between the embeddings of clean and poisoned data. Additionally, combined with BadMatch and BadDist, the attacking pipeline consistently fends off backdoor assaults across diverse model designs, datasets, and triggers. Also, our findings reveal that current defense strategies are insufficient in detecting these latent threats in medical FMs' supply chains.
Paper Structure (34 sections, 6 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 34 sections, 6 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: Visualization of artificial trigger in ImageNet and naive trigger-alike medical images. The red arrow points to those trigger-alike patterns in (a) classic backdoor attack; (b) images with trigger-alike patterns from naive KVASIR dataset pogorelov2017kvasir and (c) Chest X-ray from COVIDX dataset rahman2021exploring. Combined with the observation that medical datasets often come with noisy labels karimi2020deep, a significant portion of medical images might inadvertently act as "poisoned" inputs.
  • Figure 2: The FMs' supply chain consists of three stages: Pre-train, Release, and Deployment. During the Release stage, attackers might exploit vulnerabilities by executing malicious algorithms, influencing the subsequent Deployment phase.
  • Figure 3: The overview of the original MedCLIP training pipeline and our proposed attack framework, BadFM. We use the same mathematical notations as MedCLIP wang2022medclip to avoid confusion. $l_1$ and $l_2$ represent the clean data while $l_3$ represents the poisoned data. $t_i$ and $v_i$ are normalized image and text embeddings individually. $l_i$ represents text and image labels. The $SM$ stands for the semantic matrix and $PM$ represents the predictive matrix yields by contrastive learning (see Sec. \ref{['pre:medclip']} about detailed mechanism). In vanilla MedCLIP training, $SM$ and $PM$ are interacting through Eq \ref{['eq:semLoss']}, the semantic matching loss. In this paper, we first introduce the flipped label, shown as red $l_3$, to explore the vulnerability of mismatched data in unpaired training. Such mismatch is termed as BadMatch. Second, we introduce BadDist, which enforces the embedding between clean images to stay the same while stretching the embedding of the poisoned images, shown as red arrows and boxes. BadDist will amplify the backdoor attack combined with BadMatch. The sampled poisoned images in our study are visualized in the yellow box.
  • Figure 4: Visualization of poisoned Chest X-ray images in our study. (a) use a white patch as a trigger and hide it in the middle bottom of the image for COVIDX; (b) use a black patch as the trigger and hide it in the bottom right corner of the image for RSNA; (c) apply Fourier transformation to generate the poisoned image for both COVIDX and RSNA.
  • Figure 5: (a) Classification accuracy for ViT-based MedCLIP under untargeted attacks using Patch and Fourier-based trigger strategies. (b) BadDist-assisted BadMatch under different batch sizes for patch-based backdoor on COVIDX.