Table of Contents
Fetching ...

PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning

Noor Hussein, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar

TL;DR

Medical Vision-Language Models (Med-VLMs) are powerful but vulnerable to adversarial perturbations, motivating certifiable defenses. PromptSmooth addresses this by keeping the backbone frozen and learning small textual prompts to enable robust performance under Gaussian noise, with zero-shot and few-shot variants and Monte Carlo randomized smoothing. It introduces two prompt-learning paradigms and demonstrates state-of-the-art certified robustness across three Med-VLMs and six datasets, with significantly lower computational cost than denoising or diffusion-based methods. This approach is particularly advantageous for data-scarce medical settings, providing provable robustness without requiring large private datasets.

Abstract

Medical vision-language models (Med-VLMs) trained on large datasets of medical image-text pairs and later fine-tuned for specific tasks have emerged as a mainstream paradigm in medical image analysis. However, recent studies have highlighted the susceptibility of these Med-VLMs to adversarial attacks, raising concerns about their safety and robustness. Randomized smoothing is a well-known technique for turning any classifier into a model that is certifiably robust to adversarial perturbations. However, this approach requires retraining the Med-VLM-based classifier so that it classifies well under Gaussian noise, which is often infeasible in practice. In this paper, we propose a novel framework called PromptSmooth to achieve efficient certified robustness of Med-VLMs by leveraging the concept of prompt learning. Given any pre-trained Med-VLM, PromptSmooth adapts it to handle Gaussian noise by learning textual prompts in a zero-shot or few-shot manner, achieving a delicate balance between accuracy and robustness, while minimizing the computational overhead. Moreover, PromptSmooth requires only a single model to handle multiple noise levels, which substantially reduces the computational cost compared to traditional methods that rely on training a separate model for each noise level. Comprehensive experiments based on three Med-VLMs and across six downstream datasets of various imaging modalities demonstrate the efficacy of PromptSmooth. Our code and models are available at https://github.com/nhussein/promptsmooth.

PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning

TL;DR

Medical Vision-Language Models (Med-VLMs) are powerful but vulnerable to adversarial perturbations, motivating certifiable defenses. PromptSmooth addresses this by keeping the backbone frozen and learning small textual prompts to enable robust performance under Gaussian noise, with zero-shot and few-shot variants and Monte Carlo randomized smoothing. It introduces two prompt-learning paradigms and demonstrates state-of-the-art certified robustness across three Med-VLMs and six datasets, with significantly lower computational cost than denoising or diffusion-based methods. This approach is particularly advantageous for data-scarce medical settings, providing provable robustness without requiring large private datasets.

Abstract

Medical vision-language models (Med-VLMs) trained on large datasets of medical image-text pairs and later fine-tuned for specific tasks have emerged as a mainstream paradigm in medical image analysis. However, recent studies have highlighted the susceptibility of these Med-VLMs to adversarial attacks, raising concerns about their safety and robustness. Randomized smoothing is a well-known technique for turning any classifier into a model that is certifiably robust to adversarial perturbations. However, this approach requires retraining the Med-VLM-based classifier so that it classifies well under Gaussian noise, which is often infeasible in practice. In this paper, we propose a novel framework called PromptSmooth to achieve efficient certified robustness of Med-VLMs by leveraging the concept of prompt learning. Given any pre-trained Med-VLM, PromptSmooth adapts it to handle Gaussian noise by learning textual prompts in a zero-shot or few-shot manner, achieving a delicate balance between accuracy and robustness, while minimizing the computational overhead. Moreover, PromptSmooth requires only a single model to handle multiple noise levels, which substantially reduces the computational cost compared to traditional methods that rely on training a separate model for each noise level. Comprehensive experiments based on three Med-VLMs and across six downstream datasets of various imaging modalities demonstrate the efficacy of PromptSmooth. Our code and models are available at https://github.com/nhussein/promptsmooth.
Paper Structure (10 sections, 2 equations, 2 figures, 5 tables)

This paper contains 10 sections, 2 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Overview of PromptSmooth for certified robustness. Prompts can be learned offline or at test-time. Gaussian noise is added at test-time to $T$ copies of the input $\mathbf{I_t}$ and prompts are learned by minimizing the entropy loss (dashed orange line). Using zero-shot and/or few-shot prompts, inference is repeated for $M$ noisy instances for certification (solid black line). Model predicts (and gives a certified radius) or abstains.
  • Figure 2: Impact of changing the number of (a) shots and (b) context tokens in Few-Shot PromptSmooth and (c) varying the optimizer steps in Zero-Shot PromptSmooth.