BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning

Asif Hanif; Fahad Shamshad; Muhammad Awais; Muzammal Naseer; Fahad Shahbaz Khan; Karthik Nandakumar; Salman Khan; Rao Muhammad Anwer

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning

Asif Hanif, Fahad Shamshad, Muhammad Awais, Muzammal Naseer, Fahad Shahbaz Khan, Karthik Nandakumar, Salman Khan, Rao Muhammad Anwer

TL;DR

The paper investigates backdoor vulnerabilities in medical foundation models (Med-FMs) when using prompt learning. It introduces BAPLe, a backdoor method that embeds triggers via learnable prompts in the text encoder and an imperceptible input trigger, while keeping the backbone frozen. Evaluations across four Med-FMs and six downstream datasets demonstrate high backdoor success with minimal poisoned data (e.g., as few as $8/288$ poisoned samples), with only a small fraction of parameters updated and noticeable computational savings. The work highlights a security risk of prompt-tuned Med-FMs and emphasizes the importance of safe deployment practices in clinical settings.

Abstract

Medical foundation models are gaining prominence in the medical community for their ability to derive general representations from extensive collections of medical image-text pairs. Recent research indicates that these models are susceptible to backdoor attacks, which allow them to classify clean images accurately but fail when specific triggers are introduced. However, traditional backdoor attacks necessitate a considerable amount of additional data to maliciously pre-train a model. This requirement is often impractical in medical imaging applications due to the usual scarcity of data. Inspired by the latest developments in learnable prompts, this work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase. By incorporating learnable prompts within the text encoder and introducing imperceptible learnable noise trigger to the input images, we exploit the full capabilities of the medical foundation models (Med-FM). Our method, BAPLe, requires only a minimal subset of data to adjust the noise trigger and the text prompts for downstream tasks, enabling the creation of an effective backdoor attack. Through extensive experiments with four medical foundation models, each pre-trained on different modalities and evaluated across six downstream datasets, we demonstrate the efficacy of our approach. BAPLe achieves a high backdoor success rate across all models and datasets, outperforming the baseline backdoor attack methods. Our work highlights the vulnerability of Med-FMs towards backdoor attacks and strives to promote the safe adoption of Med-FMs before their deployment in real-world applications. Code is available at https://asif-hanif.github.io/baple/.

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning

TL;DR

poisoned samples), with only a small fraction of parameters updated and noticeable computational savings. The work highlights a security risk of prompt-tuned Med-FMs and emphasizes the importance of safe deployment practices in clinical settings.

Abstract

Paper Structure (8 sections, 6 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 8 sections, 6 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Method
Threat Model
Preliminaries
BAPLe - Backdoor Attack using Prompt Learning
Experiments and Results
Conclusion

Figures (5)

Figure 1: Comparative analysis of BAPLe against baseline methods. BAPLe seamlessly integrates natural-looking triggers pointed by red arrow commonly found in medical images, along with imperceptible learnable noise distributed across the entire image. Naïve patch-based backdoor attack, BadNets gu2017badnets, places a perceptible noisy patch as a trigger. FIBA feng2022fiba is a medical image-specific attack that manipulates the image frequency, altering the contrast. Success and failure of backdoor attacks are marked by ✓ and ✗, respectively.
Figure 2: Overview of BAPLe: BAPLe is a novel backdoor attack method that embeds a backdoor into medical foundation models (Med-FM) during the prompt learning phase. It efficiently exploits Med-FM's multimodal nature by integrating learnable prompts within the text encoder and an imperceptible noise trigger in the input images, adapting both vision and language input spaces. After prompt learning, the model behaves normally on clean images but outputs the target label $(\eta(y))$ when given a poisoned image $\bm{\mathrm{x}}+\delta$. BAPLe requires only a minimal subset of data to effectively adjust the trigger noise and text prompts for downstream tasks.
Figure 3: tSNE plots illustrating the features of (a) clean images, and (b) backdoored images. The features in (b) display a notable shift in orientation and a reformation of clusters compared to those in (a). Each color represents same class in both plots.
Figure A.1: Visualization of the learnable trigger noise $(\delta)$ after BAPLe across three X-ray datasets (COVID,RSNA18,MIMIC-CXR) and two models (MedCLIP, BioMedCLIP).
Figure A.2: Visualization of the learnable trigger noise $(\delta)$ after BAPLe across three histopathology datasets (Kather,PanNuke,DigestPath) and two models (PLIP, QuiltNet).

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning

TL;DR

Abstract

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)