Table of Contents
Fetching ...

Light-weight Fine-tuning Method for Defending Adversarial Noise in Pre-trained Medical Vision-Language Models

Xu Han, Linghao Jin, Xuezhe Ma, Xiaofeng Liu

TL;DR

This work proposes rectify adversarial noise (RAN) framework, a recipe designed to effectively defend adversarial attacks and rectify the influence of upstream noise during fine-tuning.

Abstract

Fine-tuning pre-trained Vision-Language Models (VLMs) has shown remarkable capabilities in medical image and textual depiction synergy. Nevertheless, many pre-training datasets are restricted by patient privacy concerns, potentially containing noise that can adversely affect downstream performance. Moreover, the growing reliance on multi-modal generation exacerbates this issue because of its susceptibility to adversarial attacks. To investigate how VLMs trained on adversarial noisy data perform on downstream medical tasks, we first craft noisy upstream datasets using multi-modal adversarial attacks. Through our comprehensive analysis, we unveil that moderate noise enhances model robustness and transferability, but increasing noise levels negatively impact downstream task performance. To mitigate this issue, we propose rectify adversarial noise (RAN) framework, a recipe designed to effectively defend adversarial attacks and rectify the influence of upstream noise during fine-tuning.

Light-weight Fine-tuning Method for Defending Adversarial Noise in Pre-trained Medical Vision-Language Models

TL;DR

This work proposes rectify adversarial noise (RAN) framework, a recipe designed to effectively defend adversarial attacks and rectify the influence of upstream noise during fine-tuning.

Abstract

Fine-tuning pre-trained Vision-Language Models (VLMs) has shown remarkable capabilities in medical image and textual depiction synergy. Nevertheless, many pre-training datasets are restricted by patient privacy concerns, potentially containing noise that can adversely affect downstream performance. Moreover, the growing reliance on multi-modal generation exacerbates this issue because of its susceptibility to adversarial attacks. To investigate how VLMs trained on adversarial noisy data perform on downstream medical tasks, we first craft noisy upstream datasets using multi-modal adversarial attacks. Through our comprehensive analysis, we unveil that moderate noise enhances model robustness and transferability, but increasing noise levels negatively impact downstream task performance. To mitigate this issue, we propose rectify adversarial noise (RAN) framework, a recipe designed to effectively defend adversarial attacks and rectify the influence of upstream noise during fine-tuning.
Paper Structure (46 sections, 10 equations, 10 figures, 10 tables)

This paper contains 46 sections, 10 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: The proposed multi-modal adversarial attack strategy.
  • Figure 2: Pipeline of our radiology image attacking strategy. We select a target image-caption pair $(x_\text{target}, c_\text{target})$ from MediCaT and a clean pair $(x_\text{clean}, c_\text{clean})$ from ROCOv2. $x_\text{target}$. Images are transformed to embeddings by pre-trained visual encoder $f_\phi$. Adversarial example $x_\text{adv}$ is generated by PGD (Eq \ref{['eq:2']}) iteratively and we denote the adversarial noise as $\Delta$. As formulated in Eq \ref{['eq:1']}, we optimize the $\Delta$ to maximize the similarity between $x_\text{target}$ and $x_\text{adv}$; the perturbation $\Delta$ is also limited by $\|\Delta\|\leq \epsilon$.
  • Figure 3: Our proposed prompt to generate an adversarial caption of a corresponding radiology image. foocyan!20 Highlighted are "body part" designed to change by prompt; Words are changed to opposite by prompt.
  • Figure 4: Illustration of (a) training a noisy model with a combination of adversarial and clean data. The trained noisy model is then fine-tuned on (b) chest x-ray classification task and (c) medical VQA task. In (c), we employ a co-attention module to fuse textual and visual features before feeding into a classifier. The classifier can be either a linear classification head or an MLP.
  • Figure 5: An example of generating caption $c_\text{adv}$ of crafted adversarial image $x_\text{adv}$ by black-box VLM, Llava-Med. The default prompt is “what is the content of this radiology image?”. ✗ denotes the generated caption doesn't accurately describe the content of the clean image. ✓ means otherwise.
  • ...and 5 more figures